Modeling and Results Supplement

5/27/2021

Purpose of the Document

The aim of these supplementary details are to provide readers with better documentation of the complete modeling and model selection process. There were a total of 18 models fit as part of this study, and the number of analyses and supporting figures to understand this process are simply too long for a single document. This document therefore provides relevant plots and summaries to help readers understand the steps taken to arrive at the final model, and perhaps more importantly, the document provides commentary for how each modeling choice was made.

The goal of this document is therefore to ensure complete transparency in the data analysis process. Accompanying this document is also the R script file that was created during the study process. Someone who acquires access to the appropriate files through the HRS should be able to run the script file (with some minimal alterations to ensure that the files are read from the correct files) and then get results that match those shown here (within some margin of error due to randomness in some of the methods, though the use of the same seed in R and brms should reduce the impact of this).

Note that by default all the code used to generate the results in this document is not shown. As some readers may be interested in this code, it is possible to turn on its display by toggling the switch in the top right of this page. Every output from the document also has a toggle button (a small button with the word “code”) that, when pressed, will show the code used to produce the output. Additionally, the raw Markdown file (*.rmd) is available on the github with the R script.

Overview of Sections

In an attempt to facilitate readability of this document, the modeling process is broken down into a few different sections. Additionally, there is a table of contents that readers can use to jump to any section, so for those interested in specific aspects of this supplementary material, it is useful to know what exists in each section.

First, a flowchart showing the order in which models were built and the model to which each was tested is provided. Each candidate model was compared to the up-to-then best fitting model. The aim of this flowchart is to help familiarize readers with the models and modeling steps quickly and efficiently as the remaining sections add greater detail.

Second, the priors used in the models are explored. As the data analyses were done in the Bayesian framework, the analysis of the priors is useful. The priors are shown as well as the prior predictive checks for an intercept-only model.

Third, the details of each model are provided. This is the largest section as each model includes a variety of explorations. To help reduce the overall length of the document, each model is given its own tab so that readers can select one model at a time. Details for each model can be broken down into 3 primary sections: model validity, model performance, and model estimates.

  • Model validity refers to tests of whether the estimation process converged and not subject to any issues that would make estimates from the model entirely invalid or unstable. These tests include visual inspection of chain mixtures, \(\hat{R}\), effective sample size, and maximum treedepth.
  • Model performance refers to the posterior predictive checks that mirror those shown in the manuscript for the final model: predictions of all responses for all items, responses to each item, and responses to all items for a subset of participants. As the space of this document is less limited than space in the final manuscript, the random subset of participants is increased from 6 to 20.
  • Model estimates refers to the summary of model parameters like fixed and random effects estimates. This summary is different than the one presented in the manuscript for the final paper as the objective of these intermediary models is not to summarize effect sizes or the probability of these effects; instead, the goal is to get a general idea of what the model is estimating and how it is performing. Toward this end, conditional effects plots for each model are also included. Note that these plots may not be very informative for the majority of the models tested because only a subset included many covariates.

Fourth, the results of the model comparisons are provided. These are the formal comparisons of the leave-one-out information criterion (LOOIC) and pseudo-Bayesian model averaging (pseudo-BMA) tests. Not all models were compared to one another, so it can be helpful to return to the flowchart to see the model comparisons that were made.

Finally, as discussed in the manuscript, some additional details regarding the final model are also included here. These details include summaries of the item parameters in traditional IRT metrics (i.e., as difficulty and discrimination), the reliability plot, expected score functions (total test and first trial), and test information functions (total test and first trial). The same caveat as in the manuscript is made here: the item parameters are estimated assuming the local dependency effect is 0 for every item on all three trials. This is not a realistic assumption, but these effects are unique for every response pattern, making summarizing the various effects overwhelming. The test is best understood as a dynamic one in which a person’s performance on each trial changes our expectation for how they will perform on the next; however, a gross simplification that captures the general expectation of how items will perform can be obtained by ignoring this dynamic element (as done here).

What’s Being Done

As mentioned earlier, this document shows all the code used to generate the results. Since there is an accompanying R data script, it may be useful for readers to know the objects being called in this Markdown document since those objects can be connected back to the R script. The hope is that this will create a reasonable sense of cohesion between the supplementary materials, and it should mean that all the results here are also fully reproducible. Toward that end, the objects and packages used in this document are shown below (note that the R objects are read in as RDS files whose names are consistent with those listed in the R script file).

#read in needed data
df_long <- readRDS("Data_long.rds")
Rasch_prior <- readRDS("Fitted Models/1PL_prior_check.rds")
TwoPL_prior <- readRDS("Fitted Models/2PL_prior_check.rds")
Rasch_inter <- readRDS("Fitted Models/1PL_intercept.rds")
TwoPL_inter <- readRDS("Fitted Models/2PL_intercept.rds")
TwoPL_ranItemTrialsInt <- readRDS("Fitted Models/2PL_random_items_allThree_int.rds")
TwoPL_ranItemTrialsNoInt <- readRDS("Fitted Models/2PL_random_items_allThree_noInt.rds")
TwoPL_ranItemTrial1Int <- readRDS("Fitted Models/2PL_random_items_trialOne_int.rds")
TwoPL_ranItemTrial1NoInt <- readRDS("Fitted Models/2PL_random_items_trialOne_noInt.rds")
TwoPL_depmd <- readRDS("Fitted Models/2PL_dependency_model.rds")
TwoPL_depea <- readRDS("Fitted Models/2PL_dependency_easinessOnly.rds")
TwoPL_depsi <- readRDS("Fitted Models/2PL_dependency_trialOne.rds")
TwoPL_learn <- readRDS("Fitted Models/2PL_learningModel.rds")
TwoPL_multi <- readRDS("Fitted Models/2PL_multidimensional.rds")
TwoPL_srlps <- readRDS("Fitted Models/2PL_serialPosition.rds")
TwoPL_t3spe <- readRDS("Fitted Models/2PL_serialPosition_trialUnique.rds")
TwoPL_itmex <- readRDS("Fitted Models/2PL_itemCovariates.rds")
TwoPL_itmcp <- readRDS("Fitted Models/2PL_itemCovariates_byTrial.rds")
TwoPL_itmcr <- readRDS("Fitted Models/2PL_itemCovariates_uniqueInteractions.rds")
TwoPL_itmsd <- readRDS("Fitted Models/2PL_itemCovariates_simplified.rds")
TwoPL_itmfn <- readRDS("Fitted Models/2PL_itemCovariates_reduced.rds")

#load in required packages 
#wrapping in suppressPackageStartupMessages() done just to reduce print out in document
suppressPackageStartupMessages(library(brms))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(kableExtra))

Modeling Flowchart

As can be inferred from the figure below, the modeling process involved many iterations of fitting similar models and considering results from old models. As there specific hypotheses regarding the model and its covariates, the models needed to test these hypotheses were naturally created as part of the study. At the same time, alternative models had to be specified against which to test these models. Additionally, while there were hypotheses regarding what covariates (and their signs) would ultimately be in the final model, the important result of the study is the final model itself. Some readers may see the number of models examined and variations in their specification and become concerned of potentially two things: multiple comparisons + inflated error rates and/or model fishing/p-hacking.

The objective of this supporting document is to help clarify all the modeling choices so that readers do not need to question whether model specifications were made to try to improve performance of the final result. With respect to possible concerns regarding multiple comparisons, Bayesian methods do no suffer from these concerns (Gelman et al., 2013; Gelman, Hill,& Yajima, 2012; Neath, Flores, & Cavanaugh, 2017; Sjölander & Vansteelandt, 2019). While there are several reasons that this is the case for Bayesian methods, it is sufficient to speak to three. First, we do not use null hypothesis testing in this study. Model comparisons are completed using a formal comparison of information criteria to select models with better out-of-sample performance. Coefficients are not interpreted as significant or not but instead are summarized in terms of their probability of existing. Since the correction to p-values for multiple comparisons is to control the risk of falsely rejecting the null hypothesis, this is not a concern when we are not rejecting null hypotheses. Second, we utilize skeptical priors for our effects estimates. This means that we are a priori placing greater probability of the effects being 0 (or practically equivalent to 0). This is the inverse of frequentist decision-making practices where the null hypothesis is very easy to reject since it is constrained to (usually) a nill point value, which is rarely a realistic threshold for any model specification. Finally, the models benefit from hierarchical partial pooling of effects, meaning that estimates are pulled away from extreme values and closer to the overall mean. Combined with the skeptical priors, this means that all effects estimates are pulled closer to those a priori small effects.

For anyone concerned of model fitting to simply improve performance, the remainder of this document aims to explain each modeling decision. While some of the model specifications are discussed in detail in the manuscript, specific justifications and rationales for adding or dropping covariates could not be explained there, so this document takes that next step. While every modeling decision was based on trying to find the best performing model, the study still utilized a priori hypotheses about each predictor, and each predictor was tested in order to test these hypotheses. Once these hypotheses are tested, however, it is still important to have a final model that summarizes these findings (i.e., that include only the most relevant predictors).

Flowchart of all models fit and compared in order (click to make larger)

Prior Specifications and Inspection

As discussed in the manuscript for this study, prior specification came from documentation on using brms for IRT (i.e., Bürkner, 2020a and Bürkner, 2020b). As a general note, the non-linear specification for the 2PL model used in this study come from the B$uuml;rkner (2020b) study published in the Journal of Intelligence. Also as discussed in the manuscript, the specification of priors follows the recommendations of other typical multilevel regression guides (e.g., Gelman & Hill, 2007). Specifically, the priors are normal distributions with wide variances relative to the scale of the outcome data. As these priors on the regression coefficients, normal distributions are appropriate prior distributions. While these distributions are centered on 0, they are made wide and thus only weakly informative to the final parameter estimates. This specification helps regularize estimation (i.e., pull estimates toward zero and away from more extreme values) while imparting little a priori influence on the estimates. Additionally, by making the priors skeptical (i.e., they place the greatest probability on very small to non-existent effects), there is a reduction in the risk of experimenter bias; however, with 1050 participants each observed 30 times, the data will dominate the prior anyway.

The priors for the Rasch and 2PL models are shown below:

Rasch_priors <-
  prior("normal(0, 2)", class = "Intercept") +
  prior("normal(0, 3)", class = "sd", group = "ID") + 
  prior("normal(0, 3)", class = "sd", group = "Item")

TwoPL_priors <- 
  prior("normal(0, 2)", class = "b", nlpar = "beta") +
  prior("normal(0, 1)", class = "b", nlpar = "logalpha") +
  prior("normal(0, 1)", class = "sd", group = "ID", nlpar = "theta") + 
  prior("normal(0, 3)", class = "sd", group = "Item", nlpar = "beta") +
  prior("normal(0, 1)", class = "sd", group = "Item", nlpar = "logalpha")

Readers following the R script file will recognize that the above are repeated in that document (lines 310-321). To read these priors, it can be helpful to look at a couple of examples. Starting with the priors for the Rasch model, the prior for the intercept is specified as a normal distribution with a mean of zero and standard deviation of 2. This would mean that, before looking at the data, we are guessing that there is a 68% probability of the intercept being between -2 and 2 (i.e., +/- 1SD), and we are guessing that there is about a 95% probability that the intercept will be between -4 and 4 (i.e., +/- 2SD). Since the model uses a logit transformation, these values correspond to odds ratios of \(e^{\beta_0}\) where \(\beta_0\) is the intercept. Clearly, this range is very large (e.g., ~0.02 - 54.60 at 2SD), which is what makes the prior specification weakly informative. Take another example but this time from the 2PL model and a random effect. The random effect priors are all labeled as class = "sd" since we are putting a prior belief on the plausible values of the standard deviation of the random effects. For the random person effect (i.e., latent trait of each participant), we look for the variable that defines this group (group = "ID", where ID is a number indexing each participant) and the non-linear element it is estimating (nlpar = "theta" where theta is the traditional IRT simple for the latent trait). The prior therefore for the latent trait is the standard normal distribution with mean of zero and standard deviation of 1. This specification is consistent with the treatment of the latent trait in IRT as normally distributed, though generally in IRT models the variance is constrained to be 1 for identifiability purposes (see Bürnker, 2020a for details).

While it’s comforting to be able to go through each prior specification and think about what it means, it is perhaps more efficient to examine some plots. The first set of plots shown are the parameter estimates returned when the model samples only from the prior. In other words, these are the estimated effects implied by the priors. If the prior specifications are truly skeptical and weakly informative, then they will give the greatest probability to effects of very small size and cover a wide range of plausible values. These effects estimates for an intercept only Rasch are shown below.

plot(Rasch_prior, combo = c("dens", "intervals"))

Consistent with expectations, these plots show wide ranges of plausible values with the greatest probabilities being placed on small effects. In the left column are the density plots of the estimates while the right column shows the interval estimates with the circle = mean estimate, bold line = 50% credible interval, and thin line = 90% credible interval. The intercept density very clearly follows a normal distribution that allows admissible values with more extreme estimates being given small overall probability of being true. The standard deviations for the random effects are also appropriate for the data as they are not giving any impossible (i.e., negative) estimates and cover a very wide range of possible values. These wide priors allow the data to dominate the posterior estimates for the parameters, though again this would likely be the case with even more informative priors due to the size of the sample available.

plot(TwoPL_prior, N = 3, combo = c("dens", "intervals"), ask = FALSE)

The plots above mirror those for the Rasch model but are for the 2PL model, which necessarily estimates more parameters. Again, the intercept priors make reasonable sense and the standard deviations are all giving only appropriate estimates. Note that there is also a prior over the correlation between item easiness (“beta”) and discrimination (“logalpha”). This prior is not specified in the earlier code as the default in brms is the LKJ prior with \(\eta\) = 1, which is the preferred weakly informative prior for a correlation matrix. As is apparent from the above figure, the LKJ prior is close to a uniform distribution that is bounded by -1 and 1; however, unlikely a true uniform distribution, the LKJ prior gives less probability to correlations close to perfect. Thus, it is fairly ideal as a prior for correlations as it is bound appropriate (i.e., between -1 and 1) and provides little a priori belief/information about the correlations except to say that correlations close to perfect are unlikely, which is almost always a sensible assumption for any data.

Another related graphical output is the prior predictive check. The prior predictive check runs the model using the priors rather than the observed data (note that this model would be similar to a null model). If the priors are specified well, then they should return reasonable, albeit widely varying, estimates of the observed data. The prior predictive checks for the Rasch and 2PL models are shown below following the same layout as the posterior predictive checks in the manuscript and for the other models.

pp_check(Rasch_prior, nsamples = 25, type = "bars")

pp_check(Rasch_prior, nsamples = 25, type = "bars_grouped", group = "Item")

pp_check(Rasch_prior, nsamples = 25, type = "bars_grouped", group = "ID", 
         newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 20, replace = FALSE))))

The Rasch prior predictive checks above demonstrate that the prior specifications are adequately wide to provide coverage for the observed data. The estimates themselves are expectedly poor, but the important part is that the ranges of these estimates are sufficiently wide to allow the observed data to fall within the credible intervals. The same plots are now repeated but for the 2PL model.

pp_check(TwoPL_prior, nsamples = 25, type = "bars")

pp_check(TwoPL_prior, nsamples = 25, type = "bars_grouped", group = "Item")

pp_check(TwoPL_prior, nsamples = 25, type = "bars_grouped", group = "ID", 
         newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 20, replace = FALSE))))

Performance of the 2PL priors are similar to those of the Rasch, suggesting that these priors are also appropriately specified.

While all the above plots and theoretical justifications suggest that the priors are specified consistent with the wishes for the model, it can also be helpful to perform a post-hoc test of whether a model’s priors were influential on its final estimates. As has been mentioned multiple times in this section, due to the sample size of this study, it is expected that the data and not the prior will dominate the posterior estimates, meaning that even with more informative priors the data would have more influence on the final estimates. One such comparison discussed by Andrew Gelman (link here) is to compare the posterior standard deviation (i.e., precision of the effect estimate after looking at the data) to the prior standard deviation (i.e., uncertainty of the effect estimate before looking at the data). In the case that a prior is influential, the ratio of the precision to uncertainty will be large. Put another way, we learn little more about the posterior from observing the data because the prior was already highly informative. Gelman’s recommended threshold for determining whether a prior is informative is if the posterior standard deviation for an effect is more than 0.1 times the prior standard deviation. The able below provides this metric for each predictor from the final model reported in the study. As a note, these are the only parameters examined for sensitivity to the prior as these are the only ones on which inference was conducted.

#get the posterior samples from the final model
posteriors <- posterior_samples(TwoPL_itmfn)

#get the fixed effects for the item easiness
beta <- posteriors %>%
  select(starts_with("b_beta")) %>%
  apply(., 2, function(x) sd(x)/sd(posteriors$prior_b_beta)) %>%
  as.matrix()

#do the same for item discrimination
alpha <- posteriors %>%
  select("b_logalpha_Intercept") %>%
  apply(., 2, function(x) sd(x)/sd(posteriors$prior_b_logalpha)) %>%
  as.matrix()

#combine into a single result
result <- rbind(beta, alpha) %>%
  as.data.frame() %>%
  add_column("Prior Influence" = ifelse(.[, 1] >= 0.1, "Informative", "Uninformative")) %>%
  rename("Ratio (Posterior:Prior)" = V1) %>%
  slice(1, 18, 2:11, 14:15, 17, 16, 12:13)
row.names(result) <- c("Easiness Intercept", "Discrimination Intercept", "Dependency: Butter", "Dependency: Arm", "Dependency: Shore", "Dependency: Letter", "Dependency: Queen", "Dependency: Cabin", "Dependency: Pole", "Dependency: Ticket", "Dependency: Grass", "Dependency: Engine", "HAL Frequency: Trial 1", "HAL Frequency: Trial 2 or 3", "Age of Acquisition: Trial 1 or 2", "Age of Acquisition: Trial 3", "Body-Object Integration", "Concreteness")
rm(posteriors, beta, alpha)

#get resulting table
result %>%
  kable(caption = "Comparison of the Posterior to Prior Distribution Standard Deviations", digits = 4, align = 'cc') %>%
  column_spec(1:3, bold = ifelse(result$`Ratio (Posterior:Prior)` >= 0.10, TRUE, FALSE)) %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Posterior to Prior Distribution Standard Deviations
Ratio (Posterior:Prior) Prior Influence
Easiness Intercept 0.0589 Uninformative
Discrimination Intercept 0.7092 Informative
Dependency: Butter 0.0344 Uninformative
Dependency: Arm 0.0411 Uninformative
Dependency: Shore 0.0351 Uninformative
Dependency: Letter 0.0299 Uninformative
Dependency: Queen 0.0333 Uninformative
Dependency: Cabin 0.0309 Uninformative
Dependency: Pole 0.0314 Uninformative
Dependency: Ticket 0.0293 Uninformative
Dependency: Grass 0.0347 Uninformative
Dependency: Engine 0.0300 Uninformative
HAL Frequency: Trial 1 0.2350 Informative
HAL Frequency: Trial 2 or 3 0.0840 Uninformative
Age of Acquisition: Trial 1 or 2 0.0753 Uninformative
Age of Acquisition: Trial 3 0.1457 Informative
Body-Object Integration 0.0595 Uninformative
Concreteness 0.0552 Uninformative

The table of these posterior and prior comparison results is shown here. For convenience, only those whose threshold exceeds the recommended > 0.10 ratio are bolded. Generally, the findings suggest that the priors performed as expected: they were weakly informative and did not seemingly have undue influence on the posterior estimates. Notably, the exceptions correspond to parameter estimates whose 95% credible intervals included zero (e.g., those that might be considered non-significant). This is not unexpected as the result essentially indicates that even after looking at the data we do not change our prior beliefs, which were that the effects were either zero or very small. The influence of the prior on these null findings reflects a point made earlier regarding how Bayesian methods are generally unaffected by multiple comparisons.

An important point to emphasize at this juncture is the implication of a “significant” finding in Bayesian methods. As discussed throughout this section on priors, the priors here are skeptical of an effect in the sense that they place greatest weight on an effect estimate of zero or close to zero and ambivalence regarding the direction of the effect (i.e., it is just as equally probable that the sign is positive of negative). In the context of the current study, this means that, despite the hypotheses regarding the presence and direction of specific effects, the priors for these predictors are specified in this skeptical way so as to avoid the introduction of experimenter bias. In regard to the robustness of the effects observed, the fact that they are observed from the information provided by the data despite these skeptical priors also helps build confidence in the presence of these effects.

Model Details

The details of this section highlight the models fitting and results. These details speak to the validity of the model results and then also the actual results (i.e., parameter estimates) of the model. Model validity is particularly important in Bayesian methods because the parameter estimates are based on Monte Carlo Markov Chains (or Hamiltonian Monte Carlo (HMC) in the case of these models run using Stan). In cases where a model fails to converge or throw errors under the estimator, the validity of the model results are questionable or even completely invalid (e.g., in the case of divergent transitions). To reflect this need to first confirm the validity of the results, various diagnostics of the model fit are provided first before then presenting the model results.

For readers unfamiliar with these model checks, a brief overview of each is provided here. The largest threat to model results in the HMC is arguably the presence of divergent transitions. HMC explores the posterior distribution by simulating the evolution of a Hamiltonian system, and in order to do this efficiently, the sampler finds a reasonable step size with which to explore that space. A divergent transition occurs when the trajectory of the system is lost due to too large of a step size. Another important model check is the treedepth of the chains. Again, to improve the efficiency of the posterior sampling, a maximum treedepth is set to prevent the estimator spending excessive time in certain steps and spaces. Since this treedepth may artificially limit the estimator in exploring the posterior space, it is important to check whether any of these treedepths were actually hit during estimation (default treedepth is 10). Another important Bayesian model indicator is \(\hat{R}\) because multiple HMC (and MCMC) chains are needed to ensure that the posterior is sampled appropriately. If a single chain is run, then it is not possible to determine whether the random starting values of this chain may have lead to a specific set of parameter estimates. Running multiple independent chains that each have different random starting values helps ensure that the parameter estimates are not biased by exploration of only certain posterior values. In well-behaved models, these chains will mix together without any clear indications of one chain producing a specific set of parameter estimates that differ from what other chains are estimating. While this mixture of chains can be visually inspected via the trace plot (also provided here), the \(\hat{R}\) statistic is a simple indicator of this with the conservative recommendation of treating estimates as valid only if the \(\hat{R}\) for the parameter is less than 1.01. A final model validity check shown here is the effective sample size. Because multiple chains are run for many samples of the posterior, it is expected that some of those samples are autocorrelated and thus dependent on previous samples. Effective sample size informs us of the precision of the model in MCMC and HMC methods. When samples are independent, the central limit theorem indicates that the precision with which a parameter can be estimated is proportional to the size of the sample (e.g., \(\sigma_\bar{x} = \frac{\sigma}{\sqrt{N}}\)). The same proportionality can be obtained when samples are dependent but requires replacing \(N\) with \(N_{ESS}\), or the effective sample size. Due to the dependence of the sampling, \(N_{ESS} < N\) and thus precision of the estimate is less than would be if it could be estimated from the total sample. Running the chains for more iterations will necessarily increase \(N_{ESS}\), but there is a practical demand on computational effort relative to marginal increases in precision. The recommendation of the Stan developers is to run enough iterations of the sampler to obtain an \(N_{ESS} >= 4*N_{chains}\). All models were run using 4 independent chains, so the minimally acceptable ESS is 400 (i.e, 4*100).

In the case that the model checks are appropriate, it is appropriate to examine the posterior distribution and begin inference based on the results. While there are more parameters estimated by the models, posterior summaries of the coefficients in each model are shown. The posterior is summarized as a density plot that reflects the probability distribution of the parameter based on integrating our prior knowledge and observed data. The density plot shows the 95% credible interval with the 80% credible interval shown as a shaded area. Unlike frequentist confidence intervals, these credible intervals can be interpreted as the probability of the parameter having a specific value. For example, if the 95% credible ranges from 0.50 to 1.00, then this means that there is a probability of 0.95 that the parameter has a value somewhere within this interval. This is in contrast to frequentist confidence intervals where the same interval would be interpreted as meaning that 95% of point estimates based on the same statistical test applied to an infinite number of random samples of the same population will be within this interval. Thus, where the credible interval directly summarizes the our beliefs about the parameter and our uncertainty about its true value, the confidence interval only reflects point estimates that we would expect to observe if the study and statistical methods were repeated an infinite number of times. Posterior predictive checks of the models are then also presented as they were for the final model in the corresponding manuscript. There is one additional model exploration plot provided in this section that has not been addressed before in this document: the conditional effects plot. As the predictors in these models are correlated and have their effects estimated on the logit scale, it can be challenging to look at the model estimates and understand the implication of these values in an intuitive manner. One way to address this is to visualize how the predicted outcome of the model changes as a function of each predictor while holding all other predictors constant (e.g., at their mean value). The resulting plot is the conditional effects plot. In the case of these models, this plot shows, for each predictor, what happens to the predicted probability of a correct response as the value of the predictor changes and all other model values are held constant. These plots are not realistic as it is not reasonable to assume that there exist words whose traits can vary only on one property at a time; however, they do provide a quick method of understanding the relative effect of each predictor by showing its linear trend as implied by the model. As a result, these plots should not be used for prediction or extrapolation in any regard; instead, if the goal is prediction of responses, then the entire model should be used, and extrapolation of these predictions to values not observed in this study should be avoided. These plots are simply to help contextualize the meaning of the effect estimate in the model.

Rasch Intercept

mcmc_plot(Rasch_inter, type = "nuts_divergence")

mcmc_plot(Rasch_inter, type = "nuts_treedepth")

mcmc_plot(Rasch_inter, type = "trace")
## No divergences to plot.

mcmc_plot(Rasch_inter, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(Rasch_inter, type = "neff_hist", binwidth = 0.1)

The intercept only Rasch model demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. As a result, we can look at the results from the model overall.

mcmc_plot(Rasch_inter, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(Rasch_inter, nsamples = 50, type = "bars")

pp_check(Rasch_inter, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(Rasch_inter, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Even with no predictors and fewer parameters in the Rasch model, the model does very well in predicting responses. Again, as this is an intercept only model, there is only one parameter to summarize with the posterior. The density plot of the intercept estimate demonstrates that there is high probability that the intercept is positive with a mode around 0.65. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters.

summary(Rasch_inter)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ 1 + (1 | Item) + (1 | ID) 
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.53      0.02     0.50     0.57 1.00     3575     4956
## 
## ~Item (Number of levels: 10) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.51      0.15     0.31     0.88 1.00     1921     2978
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.66      0.17     0.31     0.99 1.00     1079     1562
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

To help clarify the meaning of some major elements of the above output, consider the following guide: 1. “Estimate” refers to the average posterior value for the parameter, 2. “Est. Error” is the standard deviation of the posterior distribution, 3. “l-95% CI” is the lower bound of the 95% credible interval, 4. “u-95% CI” is the upper bound of the 95% credible interval, 5. “Rhat” is the \(\hat{R}\) value for that parameter (rounded to two decimal places), 6. “Bulk_ESS” is the effective sample size based on rank normalized draws and estimates the sampling efficiency of the mean of the posterior, and 7. “Tail_ESS” is the minimum of the effect sample sizes in the 5% and 95% quantiles.

2PL Intercept

mcmc_plot(TwoPL_inter, type = "nuts_divergence")

mcmc_plot(TwoPL_inter, type = "nuts_treedepth")

mcmc_plot(TwoPL_inter, type = "trace")
## No divergences to plot.

mcmc_plot(TwoPL_inter, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_inter, type = "neff_hist", binwidth = 0.1)

The intercept only 2PL model demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. As a result, we can look at the results from the model overall.

mcmc_plot(TwoPL_inter, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_inter, nsamples = 50, type = "bars")

pp_check(TwoPL_inter, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_inter, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Much like with the Rasch model, the 2PL intercept-only model does very well in predicting responses. Although this is an intercept only model, there are two intercepts (one for difficulty/beta and one for discrimination/alpha) being estimated since the 2PL model is a non-linear model. Note that the model’s name for the discrimination parameter is “logalpha.” This name reflects the fact that the alpha parameter was log-transformed to ensure that it was constrained in estimation to positive values. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_inter)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + (1 | i | Item)
##          logalpha ~ 1 + (1 | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.70      0.46     0.12     1.84 1.00    10470     5582
## 
## ~Item (Number of levels: 10) 
##                                        Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Intercept)                         0.54      0.16     0.32     0.94
## sd(logalpha_Intercept)                     0.93      0.26     0.55     1.56
## cor(beta_Intercept,logalpha_Intercept)    -0.09      0.30    -0.63     0.50
##                                        Rhat Bulk_ESS Tail_ESS
## sd(beta_Intercept)                     1.00     2967     4770
## sd(logalpha_Intercept)                 1.00     4654     5884
## cor(beta_Intercept,logalpha_Intercept) 1.00     5219     5203
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.67      0.18     0.31     1.02 1.00     1996     3701
## logalpha_Intercept    -0.28      0.72    -1.54     1.25 1.00    10251     5841
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

All Trials Fixed & Random Intercepts

mcmc_plot(TwoPL_ranItemTrialsInt, type = "nuts_divergence")

mcmc_plot(TwoPL_ranItemTrialsInt, type = "nuts_treedepth")

mcmc_plot(TwoPL_ranItemTrialsInt, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrialsInt, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrialsInt, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrialsInt, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_ranItemTrialsInt, type = "neff_hist", binwidth = 0.1)

The 2PL model with varying intercepts across trials and fixed parameter intercepts demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Just for the sake of clarification, readers may note that there are considerably more trace plots included in these analyses. This is because there are additional parameters being estimated for these models as there are correlations between each item’s difficulty and discrimination intercepts across each trial in addition to the standard deviation of the random effects for difficulty, discrimination, and intercept across trials. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_ranItemTrialsInt, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_ranItemTrialsInt, nsamples = 50, type = "bars")

pp_check(TwoPL_ranItemTrialsInt, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_ranItemTrialsInt, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_ranItemTrialsInt)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + (1 + Time | i | Item)
##          logalpha ~ 1 + (1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.75      0.48     0.12     1.95 1.00     6466     4397
## 
## ~Item (Number of levels: 10) 
##                                        Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Intercept)                         1.13      0.31     0.69     1.87
## sd(beta_Time1)                             1.34      0.30     0.89     2.07
## sd(beta_Time2)                             1.71      0.37     1.16     2.57
## sd(logalpha_Intercept)                     1.16      0.27     0.73     1.80
## sd(logalpha_Time1)                         0.55      0.20     0.24     1.01
## sd(logalpha_Time2)                         0.86      0.23     0.49     1.39
## cor(beta_Intercept,beta_Time1)            -0.43      0.26    -0.82     0.17
## cor(beta_Intercept,beta_Time2)            -0.64      0.24    -0.94    -0.05
## cor(beta_Time1,beta_Time2)                 0.64      0.18     0.21     0.90
## cor(beta_Intercept,logalpha_Intercept)    -0.11      0.24    -0.55     0.37
## cor(beta_Time1,logalpha_Intercept)         0.00      0.26    -0.49     0.51
## cor(beta_Time2,logalpha_Intercept)         0.18      0.26    -0.35     0.65
## cor(beta_Intercept,logalpha_Time1)        -0.10      0.25    -0.57     0.40
## cor(beta_Time1,logalpha_Time1)            -0.03      0.27    -0.54     0.51
## cor(beta_Time2,logalpha_Time1)             0.02      0.24    -0.46     0.49
## cor(logalpha_Intercept,logalpha_Time1)    -0.31      0.29    -0.78     0.31
## cor(beta_Intercept,logalpha_Time2)         0.03      0.24    -0.43     0.51
## cor(beta_Time1,logalpha_Time2)             0.25      0.22    -0.21     0.64
## cor(beta_Time2,logalpha_Time2)             0.06      0.22    -0.38     0.49
## cor(logalpha_Intercept,logalpha_Time2)    -0.69      0.20    -0.94    -0.21
## cor(logalpha_Time1,logalpha_Time2)         0.47      0.30    -0.22     0.91
##                                        Rhat Bulk_ESS Tail_ESS
## sd(beta_Intercept)                     1.00     1757     3356
## sd(beta_Time1)                         1.00     2099     3229
## sd(beta_Time2)                         1.00     1934     3667
## sd(logalpha_Intercept)                 1.00     3528     5035
## sd(logalpha_Time1)                     1.00     3117     3989
## sd(logalpha_Time2)                     1.00     3082     4640
## cor(beta_Intercept,beta_Time1)         1.00     1108     1918
## cor(beta_Intercept,beta_Time2)         1.00      902     1907
## cor(beta_Time1,beta_Time2)             1.00     3523     4499
## cor(beta_Intercept,logalpha_Intercept) 1.00     3502     4655
## cor(beta_Time1,logalpha_Intercept)     1.00     4985     5565
## cor(beta_Time2,logalpha_Intercept)     1.00     4444     4865
## cor(beta_Intercept,logalpha_Time1)     1.00     4793     5210
## cor(beta_Time1,logalpha_Time1)         1.00     6590     6374
## cor(beta_Time2,logalpha_Time1)         1.00     5659     5830
## cor(logalpha_Intercept,logalpha_Time1) 1.00     5719     5585
## cor(beta_Intercept,logalpha_Time2)     1.00     1765     3303
## cor(beta_Time1,logalpha_Time2)         1.00     5146     5519
## cor(beta_Time2,logalpha_Time2)         1.00     4523     5612
## cor(logalpha_Intercept,logalpha_Time2) 1.00     4507     4856
## cor(logalpha_Time1,logalpha_Time2)     1.00     4354     5531
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.56      0.39    -0.22     1.31 1.01      637     1237
## logalpha_Intercept    -0.19      0.72    -1.47     1.33 1.00     5838     4552
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

All Trials Fixed Intercept

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "nuts_divergence")

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "nuts_treedepth")

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "neff_hist", binwidth = 0.1)

The 2PL model with fixed intercepts but no random intercepts demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_ranItemTrialsNoInt, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_ranItemTrialsNoInt, nsamples = 50, type = "bars")

pp_check(TwoPL_ranItemTrialsNoInt, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_ranItemTrialsNoInt, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_ranItemTrialsNoInt)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + (0 + Time | i | Item)
##          logalpha ~ 1 + (0 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.79      0.48     0.15     1.99 1.00     9694     5520
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.67      0.47     0.97     2.77 1.00
## sd(beta_Time1)                         0.91      0.25     0.56     1.51 1.00
## sd(beta_Time2)                         0.68      0.23     0.38     1.27 1.00
## sd(logalpha_Time0)                     1.49      0.37     0.90     2.35 1.00
## sd(logalpha_Time1)                     0.74      0.20     0.43     1.23 1.00
## sd(logalpha_Time2)                     0.45      0.13     0.26     0.78 1.00
## cor(beta_Time0,beta_Time1)             0.25      0.26    -0.30     0.70 1.00
## cor(beta_Time0,beta_Time2)            -0.32      0.27    -0.75     0.28 1.00
## cor(beta_Time1,beta_Time2)             0.14      0.26    -0.38     0.64 1.00
## cor(beta_Time0,logalpha_Time0)         0.06      0.23    -0.40     0.50 1.00
## cor(beta_Time1,logalpha_Time0)        -0.09      0.25    -0.54     0.40 1.00
## cor(beta_Time2,logalpha_Time0)         0.16      0.25    -0.34     0.63 1.00
## cor(beta_Time0,logalpha_Time1)        -0.01      0.25    -0.50     0.47 1.00
## cor(beta_Time1,logalpha_Time1)        -0.22      0.25    -0.65     0.30 1.00
## cor(beta_Time2,logalpha_Time1)         0.12      0.25    -0.38     0.58 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.60      0.20     0.12     0.90 1.00
## cor(beta_Time0,logalpha_Time2)        -0.22      0.29    -0.72     0.39 1.00
## cor(beta_Time1,logalpha_Time2)         0.02      0.27    -0.50     0.53 1.00
## cor(beta_Time2,logalpha_Time2)         0.27      0.27    -0.32     0.73 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.30      0.26    -0.25     0.74 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.60      0.21     0.09     0.91 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1513     2338
## sd(beta_Time1)                         3211     4645
## sd(beta_Time2)                         2054     2875
## sd(logalpha_Time0)                     5495     5726
## sd(logalpha_Time1)                     4788     5733
## sd(logalpha_Time2)                     4979     6049
## cor(beta_Time0,beta_Time1)             3076     4132
## cor(beta_Time0,beta_Time2)             4218     5494
## cor(beta_Time1,beta_Time2)             3926     4235
## cor(beta_Time0,logalpha_Time0)         4681     5437
## cor(beta_Time1,logalpha_Time0)         5151     5393
## cor(beta_Time2,logalpha_Time0)         3620     5669
## cor(beta_Time0,logalpha_Time1)         5145     5398
## cor(beta_Time1,logalpha_Time1)         7289     6149
## cor(beta_Time2,logalpha_Time1)         5186     5981
## cor(logalpha_Time0,logalpha_Time1)     6550     6713
## cor(beta_Time0,logalpha_Time2)         7824     5706
## cor(beta_Time1,logalpha_Time2)         7176     6176
## cor(beta_Time2,logalpha_Time2)         6740     6109
## cor(logalpha_Time0,logalpha_Time2)     6133     6228
## cor(logalpha_Time1,logalpha_Time2)     5500     5910
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         1.04      0.22     0.56     1.46 1.00     1345     1696
## logalpha_Intercept    -0.13      0.67    -1.28     1.35 1.00     9336     5636
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Trial 1 Fixed & Random Intercepts

mcmc_plot(TwoPL_ranItemTrial1Int, type = "nuts_divergence")

mcmc_plot(TwoPL_ranItemTrial1Int, type = "nuts_treedepth")

mcmc_plot(TwoPL_ranItemTrial1Int, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrial1Int, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrial1Int, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrial1Int, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_ranItemTrial1Int, type = "neff_hist", binwidth = 0.1)

When initially run, this model did produce two divergent transitions after warmup. To address this estimation issue, the adaptive delta value for the step size was increased from 0.80 to 0.95. With this higher adaptive delta, the 2PL model with varying intercepts for trial 1 versus trials 2 and 3 demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_ranItemTrial1Int, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_ranItemTrial1Int, nsamples = 50, type = "bars")

pp_check(TwoPL_ranItemTrial1Int, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_ranItemTrial1Int, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_ranItemTrial1Int)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + (1 + Trial1 | i | Item)
##          logalpha ~ 1 + (1 + Trial1 | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.79      0.48     0.15     1.95 1.00     3995     4347
## 
## ~Item (Number of levels: 10) 
##                                          Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Intercept)                           0.60      0.19     0.34     1.09
## sd(beta_Trial11)                             1.71      0.43     1.09     2.74
## sd(logalpha_Intercept)                       0.68      0.20     0.40     1.16
## sd(logalpha_Trial11)                         1.16      0.37     0.60     2.02
## cor(beta_Intercept,beta_Trial11)            -0.13      0.37    -0.75     0.61
## cor(beta_Intercept,logalpha_Intercept)       0.05      0.30    -0.52     0.60
## cor(beta_Trial11,logalpha_Intercept)        -0.14      0.35    -0.74     0.55
## cor(beta_Intercept,logalpha_Trial11)         0.05      0.30    -0.51     0.60
## cor(beta_Trial11,logalpha_Trial11)           0.07      0.26    -0.44     0.56
## cor(logalpha_Intercept,logalpha_Trial11)     0.37      0.27    -0.22     0.81
##                                          Rhat Bulk_ESS Tail_ESS
## sd(beta_Intercept)                       1.00     1587     2609
## sd(beta_Trial11)                         1.00     1748     3126
## sd(logalpha_Intercept)                   1.00     2581     3873
## sd(logalpha_Trial11)                     1.00     3023     4353
## cor(beta_Intercept,beta_Trial11)         1.00      475      968
## cor(beta_Intercept,logalpha_Intercept)   1.00     2280     3855
## cor(beta_Trial11,logalpha_Intercept)     1.00     2304     3550
## cor(beta_Intercept,logalpha_Trial11)     1.00     1491     3270
## cor(beta_Trial11,logalpha_Trial11)       1.00     2870     4103
## cor(logalpha_Intercept,logalpha_Trial11) 1.00     2106     3948
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         1.05      0.26     0.50     1.56 1.01      386     1156
## logalpha_Intercept    -0.15      0.68    -1.32     1.31 1.00     3490     4409
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Trial 1 Fixed Intercept

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "nuts_divergence")

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "nuts_treedepth")

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "neff_hist", binwidth = 0.1)

When initially run, this model also produced a divergent transition after warmup. To address this estimation issue, the adaptive delta value for the step size was increased from 0.80 to 0.95. With this higher adaptive delta, the 2PL model with fixed intercepts for trial 1 versus trials 2 and 3 demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_ranItemTrial1NoInt, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_ranItemTrial1NoInt, nsamples = 50, type = "bars")

pp_check(TwoPL_ranItemTrial1NoInt, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_ranItemTrial1NoInt, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_ranItemTrial1NoInt)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + (0 + Trial1 | i | Item)
##          logalpha ~ 1 + (0 + Trial1 | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.76      0.50     0.13     2.00 1.00     4884     4113
## 
## ~Item (Number of levels: 10) 
##                                        Estimate Est.Error l-95% CI u-95% CI
## sd(beta_Trial10)                           0.65      0.26     0.35     1.31
## sd(beta_Trial11)                           1.58      0.48     0.87     2.76
## sd(logalpha_Trial10)                       0.63      0.17     0.37     1.04
## sd(logalpha_Trial11)                       1.49      0.38     0.91     2.35
## cor(beta_Trial10,beta_Trial11)            -0.01      0.34    -0.61     0.66
## cor(beta_Trial10,logalpha_Trial10)         0.09      0.30    -0.48     0.64
## cor(beta_Trial11,logalpha_Trial10)        -0.20      0.32    -0.74     0.45
## cor(beta_Trial10,logalpha_Trial11)         0.02      0.28    -0.51     0.57
## cor(beta_Trial11,logalpha_Trial11)        -0.02      0.27    -0.52     0.49
## cor(logalpha_Trial10,logalpha_Trial11)     0.55      0.23     0.03     0.89
##                                        Rhat Bulk_ESS Tail_ESS
## sd(beta_Trial10)                       1.00     1103     2059
## sd(beta_Trial11)                       1.00     1171     2675
## sd(logalpha_Trial10)                   1.00     2552     3976
## sd(logalpha_Trial11)                   1.00     3665     5306
## cor(beta_Trial10,beta_Trial11)         1.00      829     1597
## cor(beta_Trial10,logalpha_Trial10)     1.00     2964     3855
## cor(beta_Trial11,logalpha_Trial10)     1.00     3071     4032
## cor(beta_Trial10,logalpha_Trial11)     1.00     1889     3693
## cor(beta_Trial11,logalpha_Trial11)     1.00     3098     4412
## cor(logalpha_Trial10,logalpha_Trial11) 1.00     2027     3576
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.91      0.30     0.25     1.43 1.01      579     1346
## logalpha_Intercept    -0.16      0.71    -1.41     1.35 1.00     4833     4253
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Local Dependency Model

mcmc_plot(TwoPL_depmd, type = "nuts_divergence")

mcmc_plot(TwoPL_depmd, type = "nuts_treedepth")

mcmc_plot(TwoPL_depmd, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_depmd, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_depmd, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_depmd, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_depmd, type = "neff_hist", binwidth = 0.1)

The 2PL model with a local dependency effect demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_depmd, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_depmd, nsamples = 50, type = "bars")

pp_check(TwoPL_depmd, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_depmd, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_depmd)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + (-1 + Time | i | Item)
##          logalpha ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.67      0.45     0.11     1.82 1.00     9764     5282
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.27      0.37     0.74     2.16 1.00
## sd(beta_Time1)                         0.91      0.25     0.55     1.53 1.00
## sd(beta_Time2)                         0.82      0.24     0.48     1.43 1.00
## sd(logalpha_Time0)                     1.63      0.39     1.01     2.53 1.00
## sd(logalpha_Time1)                     0.81      0.32     0.31     1.56 1.00
## sd(logalpha_Time2)                     0.50      0.22     0.11     0.98 1.00
## cor(beta_Time0,beta_Time1)             0.31      0.25    -0.25     0.74 1.00
## cor(beta_Time0,beta_Time2)            -0.21      0.26    -0.67     0.33 1.00
## cor(beta_Time1,beta_Time2)             0.23      0.25    -0.30     0.68 1.00
## cor(beta_Time0,logalpha_Time0)        -0.09      0.24    -0.53     0.38 1.00
## cor(beta_Time1,logalpha_Time0)        -0.03      0.24    -0.48     0.45 1.00
## cor(beta_Time2,logalpha_Time0)         0.29      0.24    -0.21     0.70 1.00
## cor(beta_Time0,logalpha_Time1)        -0.03      0.27    -0.55     0.47 1.00
## cor(beta_Time1,logalpha_Time1)        -0.01      0.27    -0.54     0.51 1.00
## cor(beta_Time2,logalpha_Time1)         0.23      0.28    -0.34     0.71 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.56      0.23     0.03     0.90 1.00
## cor(beta_Time0,logalpha_Time2)        -0.08      0.30    -0.64     0.51 1.00
## cor(beta_Time1,logalpha_Time2)         0.13      0.31    -0.49     0.68 1.00
## cor(beta_Time2,logalpha_Time2)         0.29      0.30    -0.35     0.80 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.17      0.30    -0.44     0.73 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.47      0.28    -0.20     0.88 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1720     2602
## sd(beta_Time1)                         3308     5596
## sd(beta_Time2)                         2469     4312
## sd(logalpha_Time0)                     6088     6761
## sd(logalpha_Time1)                     2696     2733
## sd(logalpha_Time2)                     2008     1383
## cor(beta_Time0,beta_Time1)             3842     5199
## cor(beta_Time0,beta_Time2)             4779     5177
## cor(beta_Time1,beta_Time2)             4385     4932
## cor(beta_Time0,logalpha_Time0)         5474     5463
## cor(beta_Time1,logalpha_Time0)         6602     5979
## cor(beta_Time2,logalpha_Time0)         4232     5412
## cor(beta_Time0,logalpha_Time1)         6394     5946
## cor(beta_Time1,logalpha_Time1)         7633     6133
## cor(beta_Time2,logalpha_Time1)         5289     5555
## cor(logalpha_Time0,logalpha_Time1)     6604     6444
## cor(beta_Time0,logalpha_Time2)         9588     6431
## cor(beta_Time1,logalpha_Time2)         8505     6599
## cor(beta_Time2,logalpha_Time2)         6561     6794
## cor(logalpha_Time0,logalpha_Time2)     5705     5733
## cor(logalpha_Time1,logalpha_Time2)     5307     4443
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.52      0.22     0.07     0.95 1.00     1184     1996
## beta_DepButter         0.10      0.09    -0.08     0.26 1.00     5312     5759
## beta_DepArm            0.02      0.09    -0.16     0.18 1.00     3956     5770
## beta_DepShore          0.44      0.08     0.28     0.59 1.00     5895     5969
## beta_DepLetter         0.42      0.06     0.31     0.54 1.00    10343     6025
## beta_DepQueen          0.27      0.07     0.13     0.41 1.00     8123     6214
## beta_DepCabin          0.20      0.07     0.07     0.33 1.00     7043     5941
## beta_DepPole           0.25      0.07     0.12     0.38 1.00    11485     5876
## beta_DepTicket         0.22      0.07     0.09     0.35 1.00    10411     6190
## beta_DepGrass          0.50      0.08     0.35     0.65 1.00    11210     6082
## beta_DepEngine         0.23      0.06     0.12     0.35 1.00    16286     6281
## logalpha_Intercept    -0.31      0.74    -1.57     1.28 1.00     9347     5466
## logalpha_DepButter     0.07      0.13    -0.18     0.36 1.00     5546     4942
## logalpha_DepArm        0.30      0.10     0.10     0.49 1.00     3376     5166
## logalpha_DepShore      0.06      0.12    -0.18     0.31 1.00     8927     6267
## logalpha_DepLetter    -1.03      0.62    -2.43     0.01 1.00     6629     5933
## logalpha_DepQueen      0.08      0.11    -0.12     0.29 1.00     4103     4470
## logalpha_DepCabin      0.15      0.13    -0.10     0.41 1.00     4169     5928
## logalpha_DepPole      -0.48      0.59    -2.03     0.29 1.00     3211     4244
## logalpha_DepTicket     0.24      0.10     0.03     0.44 1.00     6959     6465
## logalpha_DepGrass      0.14      0.21    -0.27     0.59 1.00     3858     5076
## logalpha_DepEngine    -0.03      0.20    -0.43     0.38 1.00     5403     6075
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Local Dependency (Unique by Item Easiness)

mcmc_plot(TwoPL_depea, type = "nuts_divergence")

mcmc_plot(TwoPL_depea, type = "nuts_treedepth")

mcmc_plot(TwoPL_depea, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_depea, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_depea, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_depea, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_depea, type = "neff_hist", binwidth = 0.1)

The 2PL model with a local dependency effect only on the easiness parameter demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_depea, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_depea, nsamples = 50, type = "bars")

pp_check(TwoPL_depea, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_depea, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_depea)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.72      0.46     0.12     1.85 1.00     7820     4729
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.32      0.39     0.76     2.29 1.00
## sd(beta_Time1)                         0.98      0.27     0.60     1.65 1.00
## sd(beta_Time2)                         0.93      0.28     0.54     1.63 1.00
## sd(logalpha_Time0)                     1.63      0.40     1.00     2.55 1.00
## sd(logalpha_Time1)                     1.03      0.31     0.59     1.76 1.00
## sd(logalpha_Time2)                     0.62      0.19     0.35     1.08 1.00
## cor(beta_Time0,beta_Time1)             0.28      0.26    -0.27     0.71 1.00
## cor(beta_Time0,beta_Time2)            -0.23      0.26    -0.68     0.33 1.00
## cor(beta_Time1,beta_Time2)             0.28      0.25    -0.26     0.72 1.00
## cor(beta_Time0,logalpha_Time0)        -0.08      0.24    -0.53     0.38 1.00
## cor(beta_Time1,logalpha_Time0)        -0.00      0.24    -0.47     0.48 1.00
## cor(beta_Time2,logalpha_Time0)         0.28      0.24    -0.22     0.69 1.00
## cor(beta_Time0,logalpha_Time1)         0.02      0.25    -0.47     0.48 1.00
## cor(beta_Time1,logalpha_Time1)         0.02      0.25    -0.48     0.50 1.00
## cor(beta_Time2,logalpha_Time1)         0.18      0.26    -0.33     0.65 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.50      0.21     0.02     0.83 1.00
## cor(beta_Time0,logalpha_Time2)        -0.07      0.27    -0.59     0.46 1.00
## cor(beta_Time1,logalpha_Time2)         0.26      0.27    -0.30     0.72 1.00
## cor(beta_Time2,logalpha_Time2)         0.30      0.27    -0.27     0.75 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.15      0.25    -0.35     0.62 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.58      0.22     0.07     0.90 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1154     2286
## sd(beta_Time1)                         2722     3842
## sd(beta_Time2)                         1966     3037
## sd(logalpha_Time0)                     5373     5484
## sd(logalpha_Time1)                     5284     5980
## sd(logalpha_Time2)                     4564     4942
## cor(beta_Time0,beta_Time1)             3062     4242
## cor(beta_Time0,beta_Time2)             3886     4852
## cor(beta_Time1,beta_Time2)             4003     4825
## cor(beta_Time0,logalpha_Time0)         4136     5527
## cor(beta_Time1,logalpha_Time0)         4418     5362
## cor(beta_Time2,logalpha_Time0)         3426     5295
## cor(beta_Time0,logalpha_Time1)         5002     5969
## cor(beta_Time1,logalpha_Time1)         5503     5390
## cor(beta_Time2,logalpha_Time1)         4262     5194
## cor(logalpha_Time0,logalpha_Time1)     5781     6278
## cor(beta_Time0,logalpha_Time2)         6308     5832
## cor(beta_Time1,logalpha_Time2)         5667     5815
## cor(beta_Time2,logalpha_Time2)         5887     6121
## cor(logalpha_Time0,logalpha_Time2)     5633     5822
## cor(logalpha_Time1,logalpha_Time2)     5801     6971
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.54      0.24     0.07     0.99 1.00      975     1613
## beta_DepButter         0.09      0.08    -0.07     0.24 1.00     6116     5519
## beta_DepArm           -0.12      0.09    -0.30     0.06 1.00     3814     5255
## beta_DepShore          0.45      0.08     0.29     0.60 1.00     5940     5951
## beta_DepLetter         0.42      0.06     0.31     0.53 1.00    11326     6457
## beta_DepQueen          0.23      0.07     0.09     0.36 1.00     9820     5904
## beta_DepCabin          0.18      0.07     0.05     0.31 1.00     6794     6087
## beta_DepPole           0.24      0.07     0.11     0.37 1.00    11461     5960
## beta_DepTicket         0.16      0.06     0.03     0.29 1.00    10811     6127
## beta_DepGrass          0.48      0.07     0.34     0.63 1.00    10776     6514
## beta_DepEngine         0.23      0.06     0.12     0.34 1.00    13633     6502
## logalpha_Intercept    -0.28      0.70    -1.46     1.25 1.00     7308     5273
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Local Dependency (Shared Effect on Item Easiness)

mcmc_plot(TwoPL_depsi, type = "nuts_divergence")

mcmc_plot(TwoPL_depsi, type = "nuts_treedepth")

mcmc_plot(TwoPL_depsi, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_depsi, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_depsi, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_depsi, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_depsi, type = "neff_hist", binwidth = 0.1)

The 2PL model with a single local dependency effect across items on the easiness parameter demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_depsi, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_depsi, nsamples = 50, type = "bars")

pp_check(TwoPL_depsi, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_depsi, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_depsi)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + LocDep + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.72      0.46     0.12     1.86 1.00     9797     5387
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.30      0.37     0.77     2.20 1.00
## sd(beta_Time1)                         0.88      0.25     0.54     1.48 1.00
## sd(beta_Time2)                         0.72      0.22     0.41     1.26 1.00
## sd(logalpha_Time0)                     1.61      0.39     0.97     2.49 1.00
## sd(logalpha_Time1)                     0.87      0.26     0.50     1.50 1.00
## sd(logalpha_Time2)                     0.55      0.17     0.30     0.95 1.00
## cor(beta_Time0,beta_Time1)             0.26      0.26    -0.27     0.71 1.00
## cor(beta_Time0,beta_Time2)            -0.32      0.25    -0.73     0.23 1.00
## cor(beta_Time1,beta_Time2)             0.14      0.26    -0.38     0.61 1.00
## cor(beta_Time0,logalpha_Time0)        -0.05      0.24    -0.50     0.41 1.00
## cor(beta_Time1,logalpha_Time0)        -0.07      0.25    -0.53     0.42 1.00
## cor(beta_Time2,logalpha_Time0)         0.23      0.24    -0.28     0.66 1.00
## cor(beta_Time0,logalpha_Time1)        -0.02      0.25    -0.51     0.46 1.00
## cor(beta_Time1,logalpha_Time1)        -0.15      0.26    -0.62     0.36 1.00
## cor(beta_Time2,logalpha_Time1)         0.11      0.25    -0.39     0.59 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.53      0.21     0.06     0.85 1.00
## cor(beta_Time0,logalpha_Time2)        -0.16      0.27    -0.64     0.39 1.00
## cor(beta_Time1,logalpha_Time2)         0.04      0.27    -0.48     0.54 1.00
## cor(beta_Time2,logalpha_Time2)         0.24      0.27    -0.31     0.70 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.20      0.25    -0.30     0.67 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.60      0.22     0.08     0.91 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1614     2759
## sd(beta_Time1)                         2890     4606
## sd(beta_Time2)                         2394     3859
## sd(logalpha_Time0)                     5813     5620
## sd(logalpha_Time1)                     4556     5446
## sd(logalpha_Time2)                     4602     5769
## cor(beta_Time0,beta_Time1)             2876     4719
## cor(beta_Time0,beta_Time2)             4720     5246
## cor(beta_Time1,beta_Time2)             4710     4900
## cor(beta_Time0,logalpha_Time0)         5666     5352
## cor(beta_Time1,logalpha_Time0)         6168     5450
## cor(beta_Time2,logalpha_Time0)         4383     5075
## cor(beta_Time0,logalpha_Time1)         6442     5771
## cor(beta_Time1,logalpha_Time1)         6716     6459
## cor(beta_Time2,logalpha_Time1)         5424     6099
## cor(logalpha_Time0,logalpha_Time1)     6643     6944
## cor(beta_Time0,logalpha_Time2)         8405     6806
## cor(beta_Time1,logalpha_Time2)         7937     6183
## cor(beta_Time2,logalpha_Time2)         6903     6408
## cor(logalpha_Time0,logalpha_Time2)     5868     6488
## cor(logalpha_Time1,logalpha_Time2)     6185     6820
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.59      0.18     0.22     0.95 1.00     1434     1955
## beta_LocDep            0.24      0.02     0.20     0.29 1.00    11525     5612
## logalpha_Intercept    -0.26      0.70    -1.47     1.28 1.00     9526     5548
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Learning (Growth) Model

mcmc_plot(TwoPL_learn, type = "nuts_divergence")

mcmc_plot(TwoPL_learn, type = "nuts_treedepth")

mcmc_plot(TwoPL_learn, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_learn, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_learn, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_learn, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_learn, type = "neff_hist", binwidth = 0.1)

The 2PL model with a multidimensional growth effect demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_learn, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_learn, nsamples = 50, type = "bars")

pp_check(TwoPL_learn, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_learn, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_learn)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (-1 + Time1 + Time2 + Time3 | ID)
##          beta ~ 1 + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(theta_Time1)                  1.33      0.59     0.40     2.63 1.00     4661
## sd(theta_Time2)                  0.15      0.20     0.00     0.69 1.00     1511
## sd(theta_Time3)                  0.11      0.14     0.00     0.48 1.00     3044
## cor(theta_Time1,theta_Time2)     0.18      0.56    -0.88     0.97 1.00     4692
## cor(theta_Time1,theta_Time3)     0.17      0.56    -0.88     0.97 1.00     6254
## cor(theta_Time2,theta_Time3)     0.01      0.51    -0.88     0.88 1.00     5187
##                              Tail_ESS
## sd(theta_Time1)                  3307
## sd(theta_Time2)                  3135
## sd(theta_Time3)                  3826
## cor(theta_Time1,theta_Time2)     3672
## cor(theta_Time1,theta_Time3)     4528
## cor(theta_Time2,theta_Time3)     5283
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.68      0.49     0.94     2.88 1.00
## sd(beta_Time1)                         0.92      0.26     0.56     1.56 1.00
## sd(beta_Time2)                         0.69      0.24     0.37     1.28 1.00
## sd(logalpha_Time0)                     1.49      0.38     0.90     2.36 1.00
## sd(logalpha_Time1)                     0.75      0.21     0.44     1.25 1.00
## sd(logalpha_Time2)                     0.44      0.13     0.25     0.75 1.00
## cor(beta_Time0,beta_Time1)             0.26      0.27    -0.30     0.73 1.00
## cor(beta_Time0,beta_Time2)            -0.31      0.27    -0.76     0.28 1.00
## cor(beta_Time1,beta_Time2)             0.15      0.27    -0.40     0.65 1.00
## cor(beta_Time0,logalpha_Time0)         0.04      0.24    -0.43     0.49 1.00
## cor(beta_Time1,logalpha_Time0)        -0.09      0.26    -0.57     0.42 1.00
## cor(beta_Time2,logalpha_Time0)         0.17      0.25    -0.33     0.62 1.00
## cor(beta_Time0,logalpha_Time1)        -0.02      0.26    -0.52     0.46 1.00
## cor(beta_Time1,logalpha_Time1)        -0.23      0.25    -0.66     0.30 1.00
## cor(beta_Time2,logalpha_Time1)         0.12      0.25    -0.38     0.58 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.60      0.21     0.11     0.90 1.00
## cor(beta_Time0,logalpha_Time2)        -0.19      0.30    -0.70     0.42 1.00
## cor(beta_Time1,logalpha_Time2)         0.02      0.27    -0.51     0.54 1.00
## cor(beta_Time2,logalpha_Time2)         0.26      0.27    -0.30     0.73 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.32      0.25    -0.21     0.76 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.60      0.21     0.10     0.91 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                          826     1390
## sd(beta_Time1)                         1632     3260
## sd(beta_Time2)                         1052     1497
## sd(logalpha_Time0)                     3385     5098
## sd(logalpha_Time1)                     3405     4869
## sd(logalpha_Time2)                     3761     4819
## cor(beta_Time0,beta_Time1)             1564     2786
## cor(beta_Time0,beta_Time2)             2533     3685
## cor(beta_Time1,beta_Time2)             2399     3084
## cor(beta_Time0,logalpha_Time0)         2536     4172
## cor(beta_Time1,logalpha_Time0)         3060     4360
## cor(beta_Time2,logalpha_Time0)         1952     3506
## cor(beta_Time0,logalpha_Time1)         3400     4410
## cor(beta_Time1,logalpha_Time1)         4056     4815
## cor(beta_Time2,logalpha_Time1)         3062     4886
## cor(logalpha_Time0,logalpha_Time1)     4174     5048
## cor(beta_Time0,logalpha_Time2)         4863     5073
## cor(beta_Time1,logalpha_Time2)         4914     5092
## cor(beta_Time2,logalpha_Time2)         4830     5639
## cor(logalpha_Time0,logalpha_Time2)     3794     5495
## cor(logalpha_Time1,logalpha_Time2)     4577     6036
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         1.03      0.24     0.47     1.44 1.00      658      658
## logalpha_Intercept    -0.80      0.50    -1.68     0.30 1.00     4238     3760
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Multidimensional Model

mcmc_plot(TwoPL_multi, type = "nuts_divergence")

mcmc_plot(TwoPL_multi, type = "nuts_treedepth")

mcmc_plot(TwoPL_multi, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_multi, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_multi, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_multi, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_multi, type = "neff_hist", binwidth = 0.1)

The 2PL model with multidimensional factors by trial demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. When initially run, the tail effective sample sizes were low. The model was refit with slightly more iterations (4000 - 2000 warmup/chain), which resolved the issue. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_multi, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_multi, nsamples = 50, type = "bars")

pp_check(TwoPL_multi, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_multi, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_multi)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (-1 + Time | ID)
##          beta ~ 1 + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## sd(theta_Time0)                  0.58      0.34     0.13     1.42 1.00     2893
## sd(theta_Time1)                  0.83      0.37     0.26     1.68 1.00     2935
## sd(theta_Time2)                  1.04      0.45     0.34     2.05 1.00     2791
## cor(theta_Time0,theta_Time1)     0.99      0.01     0.97     1.00 1.00     1051
## cor(theta_Time0,theta_Time2)     0.99      0.01     0.96     1.00 1.00      988
## cor(theta_Time1,theta_Time2)     0.99      0.01     0.98     1.00 1.00     3829
##                              Tail_ESS
## sd(theta_Time0)                  3162
## sd(theta_Time1)                  3019
## sd(theta_Time2)                  2766
## cor(theta_Time0,theta_Time1)     2217
## cor(theta_Time0,theta_Time2)     1999
## cor(theta_Time1,theta_Time2)     5616
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.68      0.49     0.99     2.85 1.01
## sd(beta_Time1)                         0.90      0.25     0.56     1.52 1.00
## sd(beta_Time2)                         0.66      0.21     0.37     1.20 1.00
## sd(logalpha_Time0)                     1.53      0.38     0.93     2.41 1.00
## sd(logalpha_Time1)                     0.79      0.23     0.45     1.34 1.00
## sd(logalpha_Time2)                     0.44      0.13     0.25     0.76 1.00
## cor(beta_Time0,beta_Time1)             0.25      0.26    -0.30     0.72 1.00
## cor(beta_Time0,beta_Time2)            -0.30      0.28    -0.75     0.30 1.00
## cor(beta_Time1,beta_Time2)             0.15      0.26    -0.38     0.63 1.00
## cor(beta_Time0,logalpha_Time0)        -0.10      0.27    -0.60     0.42 1.00
## cor(beta_Time1,logalpha_Time0)        -0.10      0.25    -0.55     0.40 1.00
## cor(beta_Time2,logalpha_Time0)         0.27      0.24    -0.24     0.69 1.00
## cor(beta_Time0,logalpha_Time1)        -0.09      0.28    -0.62     0.46 1.00
## cor(beta_Time1,logalpha_Time1)        -0.23      0.25    -0.68     0.29 1.00
## cor(beta_Time2,logalpha_Time1)         0.16      0.26    -0.36     0.62 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.56      0.22     0.06     0.88 1.00
## cor(beta_Time0,logalpha_Time2)        -0.13      0.30    -0.68     0.47 1.00
## cor(beta_Time1,logalpha_Time2)         0.03      0.26    -0.48     0.53 1.00
## cor(beta_Time2,logalpha_Time2)         0.23      0.27    -0.35     0.70 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.36      0.25    -0.18     0.77 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.60      0.22     0.07     0.91 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                          858     1354
## sd(beta_Time1)                         2045     3346
## sd(beta_Time2)                         1430     2367
## sd(logalpha_Time0)                     3380     4809
## sd(logalpha_Time1)                     3470     5125
## sd(logalpha_Time2)                     2950     4020
## cor(beta_Time0,beta_Time1)             1731     2936
## cor(beta_Time0,beta_Time2)             1804     3267
## cor(beta_Time1,beta_Time2)             2217     3692
## cor(beta_Time0,logalpha_Time0)         4381     4801
## cor(beta_Time1,logalpha_Time0)         4087     5195
## cor(beta_Time2,logalpha_Time0)         3086     4310
## cor(beta_Time0,logalpha_Time1)         4375     5041
## cor(beta_Time1,logalpha_Time1)         4217     4771
## cor(beta_Time2,logalpha_Time1)         3811     4490
## cor(logalpha_Time0,logalpha_Time1)     4105     5290
## cor(beta_Time0,logalpha_Time2)         5024     5241
## cor(beta_Time1,logalpha_Time2)         4380     5024
## cor(beta_Time2,logalpha_Time2)         4379     4639
## cor(logalpha_Time0,logalpha_Time2)     4121     4698
## cor(logalpha_Time1,logalpha_Time2)     4179     5542
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         1.05      0.21     0.60     1.43 1.01      782     1247
## logalpha_Intercept    -0.47      0.48    -1.30     0.57 1.00     2691     2678
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Serial Position Effect

mcmc_plot(TwoPL_srlps, type = "nuts_divergence")

mcmc_plot(TwoPL_srlps, type = "nuts_treedepth")

mcmc_plot(TwoPL_srlps, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_srlps, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_srlps, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_srlps, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_srlps, type = "neff_hist", binwidth = 0.1)

The 2PL model with a serial position effect demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_srlps, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_srlps, nsamples = 50, type = "bars")

pp_check(TwoPL_srlps, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_srlps, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_srlps)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + poly(ItemPos - 1, 2) + (-1 + Time | i | Item)
##          logalpha ~ 1 + poly(ItemPos - 1, 2) + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.72      0.47     0.12     1.88 1.00     7503     5417
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.28      0.38     0.75     2.22 1.00
## sd(beta_Time1)                         0.97      0.27     0.59     1.63 1.00
## sd(beta_Time2)                         0.91      0.27     0.53     1.57 1.00
## sd(logalpha_Time0)                     1.63      0.40     0.99     2.53 1.00
## sd(logalpha_Time1)                     1.03      0.30     0.58     1.77 1.00
## sd(logalpha_Time2)                     0.61      0.18     0.35     1.05 1.00
## cor(beta_Time0,beta_Time1)             0.28      0.25    -0.25     0.72 1.00
## cor(beta_Time0,beta_Time2)            -0.23      0.25    -0.67     0.31 1.00
## cor(beta_Time1,beta_Time2)             0.27      0.26    -0.28     0.70 1.00
## cor(beta_Time0,logalpha_Time0)        -0.09      0.24    -0.54     0.39 1.00
## cor(beta_Time1,logalpha_Time0)        -0.01      0.24    -0.47     0.46 1.00
## cor(beta_Time2,logalpha_Time0)         0.28      0.23    -0.20     0.69 1.00
## cor(beta_Time0,logalpha_Time1)         0.01      0.25    -0.47     0.49 1.00
## cor(beta_Time1,logalpha_Time1)         0.01      0.25    -0.49     0.49 1.00
## cor(beta_Time2,logalpha_Time1)         0.19      0.25    -0.34     0.64 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.49      0.22    -0.00     0.84 1.00
## cor(beta_Time0,logalpha_Time2)        -0.07      0.27    -0.58     0.46 1.00
## cor(beta_Time1,logalpha_Time2)         0.26      0.26    -0.29     0.71 1.00
## cor(beta_Time2,logalpha_Time2)         0.30      0.27    -0.26     0.76 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.15      0.25    -0.36     0.61 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.57      0.22     0.06     0.90 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1612     2281
## sd(beta_Time1)                         2800     4438
## sd(beta_Time2)                         1809     2963
## sd(logalpha_Time0)                     5548     6286
## sd(logalpha_Time1)                     5338     6238
## sd(logalpha_Time2)                     4450     5336
## cor(beta_Time0,beta_Time1)             3003     3967
## cor(beta_Time0,beta_Time2)             4126     4810
## cor(beta_Time1,beta_Time2)             3501     4409
## cor(beta_Time0,logalpha_Time0)         4433     5242
## cor(beta_Time1,logalpha_Time0)         4962     5816
## cor(beta_Time2,logalpha_Time0)         3977     5440
## cor(beta_Time0,logalpha_Time1)         5491     5649
## cor(beta_Time1,logalpha_Time1)         5947     6541
## cor(beta_Time2,logalpha_Time1)         4552     6202
## cor(logalpha_Time0,logalpha_Time1)     6261     6248
## cor(beta_Time0,logalpha_Time2)         7388     6146
## cor(beta_Time1,logalpha_Time2)         5438     5828
## cor(beta_Time2,logalpha_Time2)         6527     6352
## cor(logalpha_Time0,logalpha_Time2)     5600     6135
## cor(logalpha_Time1,logalpha_Time2)     5944     6077
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept               0.54      0.24     0.08     1.01 1.00     1028
## beta_DepButter               0.09      0.08    -0.07     0.24 1.00     7093
## beta_DepArm                 -0.11      0.09    -0.30     0.06 1.00     3846
## beta_DepShore                0.45      0.08     0.30     0.61 1.00     5793
## beta_DepLetter               0.42      0.06     0.30     0.54 1.00    13324
## beta_DepQueen                0.23      0.07     0.10     0.37 1.00    10710
## beta_DepCabin                0.18      0.07     0.05     0.32 1.00     8003
## beta_DepPole                 0.24      0.07     0.11     0.37 1.00    12604
## beta_DepTicket               0.16      0.06     0.04     0.29 1.00    12334
## beta_DepGrass                0.48      0.08     0.33     0.63 1.00    12392
## beta_DepEngine               0.23      0.06     0.12     0.35 1.00    15090
## beta_polyItemPosM121         0.26      2.00    -3.64     4.27 1.00    21265
## beta_polyItemPosM122         0.40      1.96    -3.47     4.22 1.00    18852
## logalpha_Intercept          -0.26      0.71    -1.48     1.26 1.00     7248
## logalpha_polyItemPosM121    -0.17      1.00    -2.18     1.84 1.00    17834
## logalpha_polyItemPosM122     0.03      1.00    -1.94     1.95 1.00    20710
##                          Tail_ESS
## beta_Intercept               1706
## beta_DepButter               5512
## beta_DepArm                  4735
## beta_DepShore                5959
## beta_DepLetter               6106
## beta_DepQueen                6254
## beta_DepCabin                5957
## beta_DepPole                 5818
## beta_DepTicket               6101
## beta_DepGrass                6081
## beta_DepEngine               6224
## beta_polyItemPosM121         5575
## beta_polyItemPosM122         4806
## logalpha_Intercept           5171
## logalpha_polyItemPosM121     5145
## logalpha_polyItemPosM122     5319
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Serial Position Effect by Trial

mcmc_plot(TwoPL_t3spe, type = "nuts_divergence")

mcmc_plot(TwoPL_t3spe, type = "nuts_treedepth")

mcmc_plot(TwoPL_t3spe, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_t3spe, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_t3spe, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_t3spe, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_t3spe, type = "neff_hist", binwidth = 0.1)

The 2PL model with a unique serial position effect across each trial demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_t3spe, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_t3spe, nsamples = 50, type = "bars")

pp_check(TwoPL_t3spe, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_t3spe, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_t3spe)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + poly(ItemPos - 1, 2):Time + (-1 + Time | i | Item)
##          logalpha ~ 1 + poly(ItemPos - 1, 2):Time + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.72      0.47     0.12     1.88 1.00     8383     4410
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.29      0.38     0.75     2.19 1.00
## sd(beta_Time1)                         0.98      0.27     0.60     1.64 1.00
## sd(beta_Time2)                         0.92      0.28     0.53     1.60 1.00
## sd(logalpha_Time0)                     1.62      0.39     0.99     2.48 1.00
## sd(logalpha_Time1)                     1.02      0.30     0.59     1.74 1.00
## sd(logalpha_Time2)                     0.61      0.19     0.34     1.07 1.00
## cor(beta_Time0,beta_Time1)             0.28      0.26    -0.27     0.72 1.00
## cor(beta_Time0,beta_Time2)            -0.23      0.26    -0.67     0.30 1.00
## cor(beta_Time1,beta_Time2)             0.27      0.25    -0.27     0.70 1.00
## cor(beta_Time0,logalpha_Time0)        -0.08      0.24    -0.54     0.39 1.00
## cor(beta_Time1,logalpha_Time0)        -0.00      0.25    -0.48     0.47 1.00
## cor(beta_Time2,logalpha_Time0)         0.28      0.24    -0.22     0.69 1.00
## cor(beta_Time0,logalpha_Time1)         0.01      0.25    -0.48     0.50 1.00
## cor(beta_Time1,logalpha_Time1)         0.02      0.26    -0.48     0.51 1.00
## cor(beta_Time2,logalpha_Time1)         0.18      0.26    -0.33     0.65 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.49      0.22     0.01     0.83 1.00
## cor(beta_Time0,logalpha_Time2)        -0.08      0.27    -0.59     0.45 1.00
## cor(beta_Time1,logalpha_Time2)         0.26      0.27    -0.31     0.72 1.00
## cor(beta_Time2,logalpha_Time2)         0.31      0.26    -0.24     0.75 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.15      0.25    -0.35     0.62 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.57      0.22     0.06     0.90 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1725     2572
## sd(beta_Time1)                         3104     4512
## sd(beta_Time2)                         1981     3521
## sd(logalpha_Time0)                     6025     6133
## sd(logalpha_Time1)                     4839     5739
## sd(logalpha_Time2)                     3797     5276
## cor(beta_Time0,beta_Time1)             3269     4028
## cor(beta_Time0,beta_Time2)             4368     4683
## cor(beta_Time1,beta_Time2)             3890     5407
## cor(beta_Time0,logalpha_Time0)         4416     5103
## cor(beta_Time1,logalpha_Time0)         5108     5530
## cor(beta_Time2,logalpha_Time0)         3481     4853
## cor(beta_Time0,logalpha_Time1)         5216     5194
## cor(beta_Time1,logalpha_Time1)         5498     5395
## cor(beta_Time2,logalpha_Time1)         4339     5357
## cor(logalpha_Time0,logalpha_Time1)     5544     6467
## cor(beta_Time0,logalpha_Time2)         7332     6378
## cor(beta_Time1,logalpha_Time2)         5843     6119
## cor(beta_Time2,logalpha_Time2)         6440     6471
## cor(logalpha_Time0,logalpha_Time2)     5595     6551
## cor(logalpha_Time1,logalpha_Time2)     5774     6057
## 
## Population-Level Effects: 
##                                Estimate Est.Error l-95% CI u-95% CI Rhat
## beta_Intercept                     0.53      0.24     0.07     1.00 1.00
## beta_DepButter                     0.09      0.08    -0.07     0.24 1.00
## beta_DepArm                       -0.11      0.09    -0.30     0.06 1.00
## beta_DepShore                      0.45      0.08     0.29     0.60 1.00
## beta_DepLetter                     0.42      0.06     0.31     0.54 1.00
## beta_DepQueen                      0.23      0.07     0.09     0.36 1.00
## beta_DepCabin                      0.18      0.07     0.05     0.31 1.00
## beta_DepPole                       0.24      0.07     0.11     0.37 1.00
## beta_DepTicket                     0.16      0.06     0.04     0.28 1.00
## beta_DepGrass                      0.48      0.07     0.34     0.63 1.00
## beta_DepEngine                     0.23      0.06     0.12     0.35 1.00
## beta_polyItemPosM121:Time0        -0.03      1.98    -3.93     3.84 1.00
## beta_polyItemPosM122:Time0         0.17      1.97    -3.70     4.13 1.00
## beta_polyItemPosM121:Time1         0.08      2.02    -3.81     4.07 1.00
## beta_polyItemPosM122:Time1         0.15      2.00    -3.81     4.16 1.00
## beta_polyItemPosM121:Time2         0.17      2.00    -3.75     4.05 1.00
## beta_polyItemPosM122:Time2         0.09      2.01    -3.91     4.04 1.00
## logalpha_Intercept                -0.26      0.72    -1.47     1.30 1.00
## logalpha_polyItemPosM121:Time0    -0.03      1.02    -2.01     1.95 1.00
## logalpha_polyItemPosM122:Time0    -0.02      1.00    -2.01     1.97 1.00
## logalpha_polyItemPosM121:Time1    -0.05      1.00    -2.02     1.92 1.00
## logalpha_polyItemPosM122:Time1     0.02      1.01    -1.95     1.99 1.00
## logalpha_polyItemPosM121:Time2    -0.07      1.01    -2.06     1.91 1.00
## logalpha_polyItemPosM122:Time2     0.00      1.00    -1.95     2.00 1.00
##                                Bulk_ESS Tail_ESS
## beta_Intercept                     1291     2051
## beta_DepButter                     6586     6053
## beta_DepArm                        4071     5410
## beta_DepShore                      5292     6252
## beta_DepLetter                    13495     6087
## beta_DepQueen                      9999     6521
## beta_DepCabin                      6465     5779
## beta_DepPole                      12614     5818
## beta_DepTicket                    12269     6536
## beta_DepGrass                     12300     6218
## beta_DepEngine                    14650     5839
## beta_polyItemPosM121:Time0        19993     5095
## beta_polyItemPosM122:Time0        18343     5407
## beta_polyItemPosM121:Time1        17260     6230
## beta_polyItemPosM122:Time1        19210     5005
## beta_polyItemPosM121:Time2        17684     5485
## beta_polyItemPosM122:Time2        18779     4872
## logalpha_Intercept                 7949     4706
## logalpha_polyItemPosM121:Time0    17364     5588
## logalpha_polyItemPosM122:Time0    19046     5239
## logalpha_polyItemPosM121:Time1    18812     5238
## logalpha_polyItemPosM122:Time1    18410     5312
## logalpha_polyItemPosM121:Time2    20050     5277
## logalpha_polyItemPosM122:Time2    18592     5775
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Item Covariates

mcmc_plot(TwoPL_itmex, type = "nuts_divergence")

mcmc_plot(TwoPL_itmex, type = "nuts_treedepth")

mcmc_plot(TwoPL_itmex, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_itmex, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_itmex, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_itmex, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_itmex, type = "neff_hist", binwidth = 0.1)

The 2PL model with all item covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. The first fit did encounter a problem with hitting max treedepth in about 1100 iterations. This was addressed by increasing max treedepth from the default 10 to 20. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_itmex, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_itmex, nsamples = 50, type = "bars")

pp_check(TwoPL_itmex, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_itmex, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_itmex)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqSTX + Concrete + Density + Diversity + AoA + BOI + Phonemes + Ambiguous + NamingZ + (-1 + Time | i | Item)
##          logalpha ~ 1 + FreqSTX + Concrete + Density + Diversity + AoA + BOI + Phonemes + Ambiguous + NamingZ + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.69      0.50     0.08     1.93 1.00     5937     4592
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         0.98      0.41     0.37     1.96 1.00
## sd(beta_Time1)                         0.61      0.28     0.22     1.31 1.00
## sd(beta_Time2)                         1.06      0.38     0.47     1.96 1.00
## sd(logalpha_Time0)                     1.46      0.40     0.82     2.39 1.00
## sd(logalpha_Time1)                     0.82      0.29     0.36     1.50 1.00
## sd(logalpha_Time2)                     0.60      0.29     0.10     1.26 1.00
## cor(beta_Time0,beta_Time1)            -0.03      0.37    -0.72     0.65 1.00
## cor(beta_Time0,beta_Time2)            -0.26      0.31    -0.80     0.37 1.00
## cor(beta_Time1,beta_Time2)             0.24      0.32    -0.42     0.79 1.00
## cor(beta_Time0,logalpha_Time0)        -0.15      0.28    -0.69     0.39 1.00
## cor(beta_Time1,logalpha_Time0)         0.12      0.30    -0.46     0.69 1.00
## cor(beta_Time2,logalpha_Time0)         0.26      0.26    -0.28     0.73 1.00
## cor(beta_Time0,logalpha_Time1)        -0.04      0.30    -0.63     0.51 1.00
## cor(beta_Time1,logalpha_Time1)         0.22      0.32    -0.41     0.78 1.00
## cor(beta_Time2,logalpha_Time1)         0.21      0.29    -0.38     0.73 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.43      0.24    -0.10     0.81 1.00
## cor(beta_Time0,logalpha_Time2)        -0.14      0.33    -0.73     0.51 1.00
## cor(beta_Time1,logalpha_Time2)         0.25      0.33    -0.44     0.80 1.00
## cor(beta_Time2,logalpha_Time2)         0.10      0.33    -0.56     0.72 1.00
## cor(logalpha_Time0,logalpha_Time2)    -0.01      0.28    -0.56     0.53 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.48      0.27    -0.16     0.88 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                          974     1271
## sd(beta_Time1)                          893     2184
## sd(beta_Time2)                          875     1740
## sd(logalpha_Time0)                     4352     5094
## sd(logalpha_Time1)                     1374     1168
## sd(logalpha_Time2)                     1141      909
## cor(beta_Time0,beta_Time1)             2204     3663
## cor(beta_Time0,beta_Time2)             1563     3249
## cor(beta_Time1,beta_Time2)             1039     2074
## cor(beta_Time0,logalpha_Time0)         1525     1901
## cor(beta_Time1,logalpha_Time0)         1350     2851
## cor(beta_Time2,logalpha_Time0)         2470     3864
## cor(beta_Time0,logalpha_Time1)         2109     2562
## cor(beta_Time1,logalpha_Time1)         1692     3571
## cor(beta_Time2,logalpha_Time1)         2737     4352
## cor(logalpha_Time0,logalpha_Time1)     5256     6195
## cor(beta_Time0,logalpha_Time2)         3336     3797
## cor(beta_Time1,logalpha_Time2)         3297     4336
## cor(beta_Time2,logalpha_Time2)         3712     5248
## cor(logalpha_Time0,logalpha_Time2)     4738     5176
## cor(logalpha_Time1,logalpha_Time2)     3092     2792
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## beta_Intercept         0.43      1.27    -2.07     2.93 1.00     3194     4780
## beta_DepButter         0.03      0.08    -0.13     0.19 1.00     3091     4861
## beta_DepArm           -0.10      0.09    -0.28     0.08 1.00     2167     3725
## beta_DepShore          0.45      0.08     0.29     0.60 1.00     3955     5605
## beta_DepLetter         0.42      0.06     0.31     0.54 1.00     6814     5760
## beta_DepQueen          0.25      0.07     0.12     0.39 1.00     5940     6103
## beta_DepCabin          0.22      0.07     0.09     0.36 1.00     4583     5579
## beta_DepPole           0.27      0.07     0.14     0.41 1.00     4509     5836
## beta_DepTicket         0.16      0.06     0.04     0.28 1.00     7110     5620
## beta_DepGrass          0.49      0.07     0.34     0.63 1.00     8125     6291
## beta_DepEngine         0.23      0.06     0.12     0.35 1.00     8283     6113
## beta_FreqSTX          -0.40      0.29    -0.88     0.23 1.00      755     1996
## beta_Concrete         -0.53      0.21    -0.95    -0.09 1.00     1614     2283
## beta_Density           0.15      1.73    -3.25     3.53 1.00     5654     5754
## beta_Diversity        -0.24      0.23    -0.70     0.24 1.00     1048     2104
## beta_AoA              -0.37      0.29    -0.89     0.23 1.00      938     2352
## beta_BOI               0.68      0.22     0.25     1.10 1.00     1311     2708
## beta_Phonemes          0.00      0.19    -0.38     0.41 1.00     1685     3051
## beta_Ambiguous         0.13      0.33    -0.52     0.81 1.01     1445     2434
## beta_NamingZ           0.33      1.50    -2.71     3.26 1.00     1673     2463
## logalpha_Intercept    -0.31      0.82    -1.87     1.35 1.00     6305     5247
## logalpha_FreqSTX       0.41      0.37    -0.25     1.23 1.00     1376     1030
## logalpha_Concrete     -0.22      0.28    -0.78     0.32 1.00     1835     3167
## logalpha_Density      -0.23      0.94    -2.05     1.61 1.00     7705     5456
## logalpha_Diversity    -0.66      0.36    -1.45     0.01 1.00     1156     1195
## logalpha_AoA          -0.34      0.34    -0.99     0.35 1.00     1307     1362
## logalpha_BOI          -0.07      0.32    -0.76     0.52 1.00     1263     1050
## logalpha_Phonemes     -0.13      0.27    -0.70     0.42 1.00     1309     1736
## logalpha_Ambiguous     0.95      0.52     0.06     2.17 1.00     1096      916
## logalpha_NamingZ       0.70      0.92    -1.11     2.51 1.00     5617     5769
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Item Covariates by Trial

mcmc_plot(TwoPL_itmcp, type = "nuts_divergence")

mcmc_plot(TwoPL_itmcp, type = "nuts_treedepth")

mcmc_plot(TwoPL_itmcp, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_itmcp, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_itmcp, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_itmcp, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_itmcp, type = "neff_hist", binwidth = 0.1)

Under the default conditions, the 2PL model with item covariates varying across trials produced a single divergent transition after warmup. After increasing adapt delta to 0.85, the model demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_itmcp, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_itmcp, nsamples = 50, type = "bars")

pp_check(TwoPL_itmcp, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_itmcp, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_itmcp)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqSTX:Time + Concrete:Time + Density:Time + Diversity:Time + AoA:Time + BOI:Time + Phonemes:Time + Ambiguous:Time + NamingZ:Time + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.74      0.48     0.12     1.92 1.00     8482     5077
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         0.84      0.51     0.29     2.21 1.00
## sd(beta_Time1)                         0.78      0.50     0.25     2.09 1.00
## sd(beta_Time2)                         0.96      0.55     0.35     2.43 1.00
## sd(logalpha_Time0)                     1.56      0.39     0.95     2.46 1.00
## sd(logalpha_Time1)                     0.95      0.27     0.55     1.60 1.00
## sd(logalpha_Time2)                     0.62      0.19     0.35     1.07 1.00
## cor(beta_Time0,beta_Time1)            -0.13      0.36    -0.77     0.57 1.00
## cor(beta_Time0,beta_Time2)            -0.18      0.35    -0.79     0.54 1.00
## cor(beta_Time1,beta_Time2)             0.19      0.36    -0.53     0.81 1.00
## cor(beta_Time0,logalpha_Time0)        -0.29      0.31    -0.82     0.34 1.00
## cor(beta_Time1,logalpha_Time0)         0.20      0.30    -0.40     0.75 1.00
## cor(beta_Time2,logalpha_Time0)         0.28      0.30    -0.34     0.80 1.00
## cor(beta_Time0,logalpha_Time1)        -0.22      0.30    -0.74     0.39 1.00
## cor(beta_Time1,logalpha_Time1)         0.29      0.31    -0.36     0.81 1.00
## cor(beta_Time2,logalpha_Time1)         0.29      0.30    -0.34     0.81 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.50      0.22     0.01     0.85 1.00
## cor(beta_Time0,logalpha_Time2)        -0.06      0.32    -0.65     0.56 1.00
## cor(beta_Time1,logalpha_Time2)         0.23      0.33    -0.44     0.78 1.00
## cor(beta_Time2,logalpha_Time2)         0.18      0.32    -0.46     0.74 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.15      0.25    -0.35     0.63 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.58      0.22     0.07     0.90 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         2294     3504
## sd(beta_Time1)                         1928     2735
## sd(beta_Time2)                         2350     3575
## sd(logalpha_Time0)                     8158     6761
## sd(logalpha_Time1)                     5720     6319
## sd(logalpha_Time2)                     5774     5696
## cor(beta_Time0,beta_Time1)             6431     5608
## cor(beta_Time0,beta_Time2)             6132     6336
## cor(beta_Time1,beta_Time2)             5529     6265
## cor(beta_Time0,logalpha_Time0)         2559     3798
## cor(beta_Time1,logalpha_Time0)         3281     5031
## cor(beta_Time2,logalpha_Time0)         2845     5349
## cor(beta_Time0,logalpha_Time1)         2595     3735
## cor(beta_Time1,logalpha_Time1)         2675     4203
## cor(beta_Time2,logalpha_Time1)         3592     5415
## cor(logalpha_Time0,logalpha_Time1)     8222     6865
## cor(beta_Time0,logalpha_Time2)         2662     5081
## cor(beta_Time1,logalpha_Time2)         2890     4239
## cor(beta_Time2,logalpha_Time2)         4168     5783
## cor(logalpha_Time0,logalpha_Time2)     8663     7221
## cor(logalpha_Time1,logalpha_Time2)     7299     7178
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept           0.46      0.88    -1.25     2.23 1.00     6631
## beta_DepButter           0.07      0.08    -0.09     0.22 1.00     7138
## beta_DepArm             -0.12      0.09    -0.32     0.05 1.00     4158
## beta_DepShore            0.46      0.08     0.31     0.61 1.00     7041
## beta_DepLetter           0.41      0.06     0.29     0.53 1.00    14625
## beta_DepQueen            0.23      0.07     0.10     0.37 1.00    12981
## beta_DepCabin            0.20      0.07     0.07     0.33 1.00     7984
## beta_DepPole             0.23      0.07     0.10     0.36 1.00    14848
## beta_DepTicket           0.15      0.06     0.03     0.28 1.00    13921
## beta_DepGrass            0.47      0.08     0.32     0.62 1.00    18416
## beta_DepEngine           0.23      0.06     0.12     0.35 1.00    16121
## beta_FreqSTX:Time0      -0.42      0.39    -1.23     0.40 1.00     3742
## beta_FreqSTX:Time1      -0.66      0.37    -1.36     0.20 1.00     3733
## beta_FreqSTX:Time2       0.16      0.44    -0.75     1.10 1.00     3525
## beta_Time0:Concrete     -0.57      0.45    -1.43     0.46 1.00     3366
## beta_Time1:Concrete     -0.35      0.44    -1.23     0.55 1.00     3434
## beta_Time2:Concrete     -0.40      0.49    -1.41     0.64 1.00     3996
## beta_Time0:Density      -0.34      1.52    -3.39     2.68 1.00     8284
## beta_Time1:Density       0.20      1.52    -2.81     3.17 1.00     7837
## beta_Time2:Density       0.34      1.51    -2.65     3.31 1.00     7767
## beta_Time0:Diversity    -0.59      0.44    -1.44     0.41 1.00     2781
## beta_Time1:Diversity     0.08      0.43    -0.79     0.98 1.00     3467
## beta_Time2:Diversity     0.11      0.50    -0.93     1.17 1.00     3920
## beta_Time0:AoA          -0.24      0.47    -1.19     0.76 1.00     3021
## beta_Time1:AoA          -0.52      0.46    -1.40     0.48 1.00     2935
## beta_Time2:AoA           0.09      0.53    -1.02     1.16 1.00     3844
## beta_Time0:BOI           0.90      0.41    -0.02     1.72 1.00     4022
## beta_Time1:BOI           0.54      0.39    -0.29     1.31 1.00     4207
## beta_Time2:BOI           0.46      0.46    -0.52     1.41 1.00     4603
## beta_Time0:Phonemes     -0.10      0.43    -0.99     0.82 1.00     3211
## beta_Time1:Phonemes      0.09      0.41    -0.79     0.93 1.00     3491
## beta_Time2:Phonemes      0.10      0.49    -0.94     1.12 1.00     3531
## beta_Time0:Ambiguous     0.18      0.61    -1.17     1.39 1.00     3674
## beta_Time1:Ambiguous    -0.11      0.58    -1.22     1.13 1.00     3674
## beta_Time2:Ambiguous     0.11      0.67    -1.26     1.52 1.00     4154
## beta_Time0:NamingZ       0.90      1.57    -2.20     3.93 1.00     4691
## beta_Time1:NamingZ      -0.09      1.53    -3.11     2.85 1.00     6627
## beta_Time2:NamingZ      -0.39      1.57    -3.47     2.72 1.00     7655
## logalpha_Intercept      -0.22      0.71    -1.45     1.31 1.00     8736
##                      Tail_ESS
## beta_Intercept           5756
## beta_DepButter           6334
## beta_DepArm              5596
## beta_DepShore            6102
## beta_DepLetter           6442
## beta_DepQueen            6443
## beta_DepCabin            6441
## beta_DepPole             5960
## beta_DepTicket           6317
## beta_DepGrass            5768
## beta_DepEngine           5463
## beta_FreqSTX:Time0       3415
## beta_FreqSTX:Time1       3097
## beta_FreqSTX:Time2       3486
## beta_Time0:Concrete      3623
## beta_Time1:Concrete      3436
## beta_Time2:Concrete      3704
## beta_Time0:Density       6197
## beta_Time1:Density       6200
## beta_Time2:Density       6326
## beta_Time0:Diversity     3026
## beta_Time1:Diversity     3458
## beta_Time2:Diversity     4139
## beta_Time0:AoA           3574
## beta_Time1:AoA           3082
## beta_Time2:AoA           4068
## beta_Time0:BOI           4037
## beta_Time1:BOI           3706
## beta_Time2:BOI           3981
## beta_Time0:Phonemes      3336
## beta_Time1:Phonemes      2884
## beta_Time2:Phonemes      3777
## beta_Time0:Ambiguous     3490
## beta_Time1:Ambiguous     3895
## beta_Time2:Ambiguous     4181
## beta_Time0:NamingZ       6093
## beta_Time1:NamingZ       5989
## beta_Time2:NamingZ       6384
## logalpha_Intercept       4760
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Item Covariates with Unique Interactions

mcmc_plot(TwoPL_itmcr, type = "nuts_divergence")

mcmc_plot(TwoPL_itmcr, type = "nuts_treedepth")

mcmc_plot(TwoPL_itmcr, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_itmcr, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_itmcr, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_itmcr, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_itmcr, type = "neff_hist", binwidth = 0.1)

The 2PL model with collapsed trial interactions for item covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_itmcr, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_itmcr, nsamples = 50, type = "bars")

pp_check(TwoPL_itmcr, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_itmcr, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_itmcr)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqHAL:Trial23 + Concrete:Trial23 + Diversity:Trial23 + AoA:Trial12 + BOI:Trial23 + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.72      0.47     0.12     1.86 1.00     7562     4929
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.24      0.44     0.65     2.33 1.00
## sd(beta_Time1)                         0.35      0.22     0.07     0.92 1.00
## sd(beta_Time2)                         0.90      0.31     0.46     1.66 1.00
## sd(logalpha_Time0)                     1.61      0.39     0.98     2.48 1.00
## sd(logalpha_Time1)                     1.00      0.29     0.57     1.67 1.00
## sd(logalpha_Time2)                     0.63      0.19     0.35     1.09 1.00
## cor(beta_Time0,beta_Time1)             0.10      0.36    -0.57     0.75 1.00
## cor(beta_Time0,beta_Time2)            -0.49      0.30    -0.92     0.19 1.00
## cor(beta_Time1,beta_Time2)            -0.11      0.36    -0.74     0.60 1.00
## cor(beta_Time0,logalpha_Time0)        -0.12      0.25    -0.59     0.36 1.00
## cor(beta_Time1,logalpha_Time0)        -0.26      0.29    -0.77     0.33 1.00
## cor(beta_Time2,logalpha_Time0)         0.18      0.25    -0.32     0.63 1.00
## cor(beta_Time0,logalpha_Time1)        -0.04      0.26    -0.54     0.46 1.00
## cor(beta_Time1,logalpha_Time1)         0.00      0.29    -0.57     0.56 1.00
## cor(beta_Time2,logalpha_Time1)         0.17      0.27    -0.36     0.68 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.51      0.22     0.01     0.85 1.00
## cor(beta_Time0,logalpha_Time2)        -0.12      0.28    -0.64     0.42 1.00
## cor(beta_Time1,logalpha_Time2)         0.15      0.31    -0.48     0.70 1.00
## cor(beta_Time2,logalpha_Time2)         0.21      0.28    -0.35     0.70 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.19      0.25    -0.31     0.64 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.59      0.22     0.06     0.91 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         2515     3789
## sd(beta_Time1)                         1123     1921
## sd(beta_Time2)                         1766     2882
## sd(logalpha_Time0)                     6649     5889
## sd(logalpha_Time1)                     4630     5681
## sd(logalpha_Time2)                     3861     4952
## cor(beta_Time0,beta_Time1)             3326     4473
## cor(beta_Time0,beta_Time2)             2144     3743
## cor(beta_Time1,beta_Time2)             3079     4115
## cor(beta_Time0,logalpha_Time0)         4146     5058
## cor(beta_Time1,logalpha_Time0)         1741     3437
## cor(beta_Time2,logalpha_Time0)         3440     4517
## cor(beta_Time0,logalpha_Time1)         4171     5221
## cor(beta_Time1,logalpha_Time1)         2186     2879
## cor(beta_Time2,logalpha_Time1)         2938     3786
## cor(logalpha_Time0,logalpha_Time1)     6876     6184
## cor(beta_Time0,logalpha_Time2)         5598     5503
## cor(beta_Time1,logalpha_Time2)         2587     3960
## cor(beta_Time2,logalpha_Time2)         4298     6321
## cor(logalpha_Time0,logalpha_Time2)     6686     6637
## cor(logalpha_Time1,logalpha_Time2)     5921     6744
## 
## Population-Level Effects: 
##                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept              0.51      0.14     0.23     0.80 1.00     1343
## beta_DepButter              0.04      0.08    -0.12     0.19 1.00     3429
## beta_DepArm                -0.09      0.08    -0.27     0.07 1.00     3475
## beta_DepShore               0.45      0.07     0.31     0.59 1.00     3903
## beta_DepLetter              0.41      0.06     0.29     0.52 1.00     7083
## beta_DepQueen               0.24      0.07     0.11     0.38 1.00     9626
## beta_DepCabin               0.21      0.06     0.08     0.33 1.00     6833
## beta_DepPole                0.24      0.06     0.12     0.37 1.00     8257
## beta_DepTicket              0.14      0.06     0.02     0.26 1.00     7260
## beta_DepGrass               0.48      0.07     0.34     0.62 1.00     8814
## beta_DepEngine              0.27      0.06     0.15     0.39 1.00     5415
## beta_FreqHAL:Trial230       0.07      0.47    -0.87     1.01 1.00     1795
## beta_FreqHAL:Trial231      -0.49      0.18    -0.77    -0.06 1.00     1261
## beta_Trial230:Concrete     -0.48      0.51    -1.49     0.62 1.00     2659
## beta_Trial231:Concrete     -0.54      0.18    -0.89    -0.17 1.00     2246
## beta_Trial230:Diversity    -0.61      0.42    -1.45     0.23 1.00     2742
## beta_Trial231:Diversity    -0.01      0.16    -0.30     0.34 1.00     1951
## beta_AoA:Trial120          -0.03      0.29    -0.64     0.51 1.00     2748
## beta_AoA:Trial121          -0.33      0.18    -0.63     0.07 1.00     1945
## beta_Trial230:BOI           0.78      0.50    -0.31     1.77 1.00     2504
## beta_Trial231:BOI           0.74      0.18     0.36     1.08 1.00     2098
## logalpha_Intercept         -0.25      0.71    -1.48     1.32 1.00     7268
##                         Tail_ESS
## beta_Intercept              1677
## beta_DepButter              5123
## beta_DepArm                 4481
## beta_DepShore               6092
## beta_DepLetter              6211
## beta_DepQueen               6419
## beta_DepCabin               6412
## beta_DepPole                6195
## beta_DepTicket              6307
## beta_DepGrass               5771
## beta_DepEngine              4417
## beta_FreqHAL:Trial230       3412
## beta_FreqHAL:Trial231       2061
## beta_Trial230:Concrete      3667
## beta_Trial231:Concrete      3140
## beta_Trial230:Diversity     3746
## beta_Trial231:Diversity     3357
## beta_AoA:Trial120           4376
## beta_AoA:Trial121           2343
## beta_Trial230:BOI           3735
## beta_Trial231:BOI           2696
## logalpha_Intercept          4923
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Reduced Item Covariates Model

mcmc_plot(TwoPL_itmsd, type = "nuts_divergence")

mcmc_plot(TwoPL_itmsd, type = "nuts_treedepth")

mcmc_plot(TwoPL_itmsd, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_itmsd, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_itmsd, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_itmsd, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_itmsd, type = "neff_hist", binwidth = 0.1)

The 2PL model with the reduced item covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_itmsd, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_itmsd, nsamples = 50, type = "bars")

pp_check(TwoPL_itmsd, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_itmsd, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_itmsd)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqHAL:Trial23 + Concrete + BOI + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.73      0.48     0.11     1.90 1.00     7194     4996
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.20      0.37     0.69     2.10 1.00
## sd(beta_Time1)                         0.63      0.23     0.31     1.19 1.00
## sd(beta_Time2)                         0.70      0.26     0.36     1.35 1.00
## sd(logalpha_Time0)                     1.63      0.39     0.99     2.52 1.00
## sd(logalpha_Time1)                     1.02      0.30     0.58     1.74 1.00
## sd(logalpha_Time2)                     0.62      0.19     0.35     1.07 1.00
## cor(beta_Time0,beta_Time1)             0.05      0.31    -0.54     0.64 1.00
## cor(beta_Time0,beta_Time2)            -0.52      0.25    -0.88     0.07 1.00
## cor(beta_Time1,beta_Time2)            -0.10      0.30    -0.63     0.50 1.00
## cor(beta_Time0,logalpha_Time0)        -0.12      0.24    -0.57     0.37 1.00
## cor(beta_Time1,logalpha_Time0)        -0.07      0.26    -0.55     0.44 1.00
## cor(beta_Time2,logalpha_Time0)         0.27      0.25    -0.26     0.72 1.00
## cor(beta_Time0,logalpha_Time1)         0.08      0.26    -0.44     0.55 1.00
## cor(beta_Time1,logalpha_Time1)         0.07      0.26    -0.45     0.57 1.00
## cor(beta_Time2,logalpha_Time1)         0.31      0.27    -0.26     0.78 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.50      0.22     0.01     0.84 1.00
## cor(beta_Time0,logalpha_Time2)        -0.08      0.27    -0.58     0.45 1.00
## cor(beta_Time1,logalpha_Time2)         0.30      0.26    -0.26     0.74 1.00
## cor(beta_Time2,logalpha_Time2)         0.29      0.26    -0.26     0.74 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.17      0.25    -0.33     0.63 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.57      0.22     0.05     0.90 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         1769     3143
## sd(beta_Time1)                         2063     3750
## sd(beta_Time2)                         1044     2187
## sd(logalpha_Time0)                     5892     5679
## sd(logalpha_Time1)                     4496     6054
## sd(logalpha_Time2)                     4259     5653
## cor(beta_Time0,beta_Time1)             2301     4152
## cor(beta_Time0,beta_Time2)             3094     4856
## cor(beta_Time1,beta_Time2)             2539     3558
## cor(beta_Time0,logalpha_Time0)         3350     5421
## cor(beta_Time1,logalpha_Time0)         3208     5456
## cor(beta_Time2,logalpha_Time0)         1874     3070
## cor(beta_Time0,logalpha_Time1)         3720     4426
## cor(beta_Time1,logalpha_Time1)         4428     4968
## cor(beta_Time2,logalpha_Time1)         2243     5077
## cor(logalpha_Time0,logalpha_Time1)     7382     6238
## cor(beta_Time0,logalpha_Time2)         7321     6173
## cor(beta_Time1,logalpha_Time2)         4670     6106
## cor(beta_Time2,logalpha_Time2)         5016     5872
## cor(logalpha_Time0,logalpha_Time2)     6688     6748
## cor(logalpha_Time1,logalpha_Time2)     5772     6418
## 
## Population-Level Effects: 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept            0.57      0.17     0.22     0.89 1.01      818
## beta_DepButter            0.06      0.08    -0.09     0.21 1.00     4585
## beta_DepArm              -0.11      0.09    -0.28     0.07 1.00     3654
## beta_DepShore             0.44      0.08     0.29     0.59 1.00     4416
## beta_DepLetter            0.42      0.06     0.31     0.53 1.00    11617
## beta_DepQueen             0.25      0.07     0.11     0.39 1.00     8715
## beta_DepCabin             0.21      0.07     0.08     0.33 1.00     6436
## beta_DepPole              0.23      0.06     0.10     0.36 1.00     9885
## beta_DepTicket            0.14      0.06     0.02     0.26 1.00     8377
## beta_DepGrass             0.49      0.07     0.36     0.63 1.00     8994
## beta_DepEngine            0.23      0.05     0.12     0.34 1.00    11609
## beta_Concrete            -0.47      0.13    -0.73    -0.21 1.00     2209
## beta_BOI                  0.69      0.14     0.44     0.98 1.00     1608
## beta_FreqHAL:Trial230    -0.10      0.37    -0.85     0.63 1.00     1478
## beta_FreqHAL:Trial231    -0.23      0.18    -0.57     0.12 1.00     1226
## logalpha_Intercept       -0.26      0.73    -1.51     1.36 1.00     6666
##                       Tail_ESS
## beta_Intercept            1205
## beta_DepButter            5451
## beta_DepArm               5400
## beta_DepShore             5102
## beta_DepLetter            6071
## beta_DepQueen             6027
## beta_DepCabin             6217
## beta_DepPole              6012
## beta_DepTicket            6194
## beta_DepGrass             6173
## beta_DepEngine            6447
## beta_Concrete             2643
## beta_BOI                  2093
## beta_FreqHAL:Trial230     2432
## beta_FreqHAL:Trial231     2340
## logalpha_Intercept        5130
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Final Item Covariate Model

mcmc_plot(TwoPL_itmfn, type = "nuts_divergence")

mcmc_plot(TwoPL_itmfn, type = "nuts_treedepth")

mcmc_plot(TwoPL_itmfn, type = "trace", regex_pars = "b_")
## No divergences to plot.

mcmc_plot(TwoPL_itmfn, type = "trace", regex_pars = "sd_")
## No divergences to plot.

mcmc_plot(TwoPL_itmfn, type = "trace", regex_pars = "cor_")
## No divergences to plot.

mcmc_plot(TwoPL_itmfn, type = "rhat_hist", binwidth = 0.0001)

mcmc_plot(TwoPL_itmfn, type = "neff_hist", binwidth = 0.1)

The 2PL model with final reduction in the covariates demonstrated no evidence of estimation concerns that would raise concerns for the validity of the results. Since there are no validity concerns, we can look at the results from the model overall.

mcmc_plot(TwoPL_itmfn, type = "areas_ridges", prob = 0.80, prob_outer = 0.95, regex_pars = "b_")

pp_check(TwoPL_itmfn, nsamples = 50, type = "bars")

pp_check(TwoPL_itmfn, nsamples = 50, type = "bars_grouped", group = "Item")

pp_check(TwoPL_itmfn, nsamples = 50, type = "bars_grouped", group = "ID", newdata = subset(df_long, df_long$ID %in% as.factor(sample.int(n = 1219, size = 6, replace = FALSE))))

Like the other 2PL models, this model estimates responses with high accuracy. The following general model summary integrates basic model validity statistics and posterior summaries for additional parameters. See the guide at the end of the Rasch intercept tab for details regarding the meaning of each value.

summary(TwoPL_itmfn)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: Resp ~ beta + exp(logalpha) * theta 
##          theta ~ 0 + (1 | ID)
##          beta ~ 1 + DepButter + DepArm + DepShore + DepLetter + DepQueen + DepCabin + DepPole + DepTicket + DepGrass + DepEngine + FreqHAL:Trial23 + AoA:Trial12 + BOI + Concrete + (-1 + Time | i | Item)
##          logalpha ~ 1 + (-1 + Time | i | Item)
##    Data: df_long (Number of observations: 36570) 
## Samples: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
##          total post-warmup samples = 8000
## 
## Group-Level Effects: 
## ~ID (Number of levels: 1219) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(theta_Intercept)     0.73      0.47     0.12     1.91 1.00     7412     5016
## 
## ~Item (Number of levels: 10) 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(beta_Time0)                         1.40      0.40     0.81     2.34 1.00
## sd(beta_Time1)                         0.32      0.20     0.06     0.84 1.00
## sd(beta_Time2)                         0.89      0.29     0.44     1.58 1.00
## sd(logalpha_Time0)                     1.61      0.40     0.98     2.51 1.00
## sd(logalpha_Time1)                     1.02      0.29     0.59     1.70 1.00
## sd(logalpha_Time2)                     0.61      0.19     0.34     1.09 1.00
## cor(beta_Time0,beta_Time1)             0.18      0.35    -0.51     0.80 1.00
## cor(beta_Time0,beta_Time2)            -0.52      0.29    -0.91     0.17 1.00
## cor(beta_Time1,beta_Time2)            -0.15      0.36    -0.76     0.58 1.00
## cor(beta_Time0,logalpha_Time0)        -0.09      0.23    -0.54     0.36 1.00
## cor(beta_Time1,logalpha_Time0)        -0.23      0.29    -0.76     0.36 1.00
## cor(beta_Time2,logalpha_Time0)         0.19      0.24    -0.29     0.62 1.00
## cor(beta_Time0,logalpha_Time1)         0.06      0.25    -0.44     0.53 1.00
## cor(beta_Time1,logalpha_Time1)         0.03      0.28    -0.53     0.55 1.00
## cor(beta_Time2,logalpha_Time1)         0.20      0.26    -0.32     0.66 1.00
## cor(logalpha_Time0,logalpha_Time1)     0.51      0.21     0.03     0.84 1.00
## cor(beta_Time0,logalpha_Time2)        -0.14      0.27    -0.63     0.42 1.00
## cor(beta_Time1,logalpha_Time2)         0.17      0.30    -0.44     0.71 1.00
## cor(beta_Time2,logalpha_Time2)         0.24      0.28    -0.34     0.72 1.00
## cor(logalpha_Time0,logalpha_Time2)     0.19      0.25    -0.31     0.65 1.00
## cor(logalpha_Time1,logalpha_Time2)     0.58      0.23     0.05     0.91 1.00
##                                    Bulk_ESS Tail_ESS
## sd(beta_Time0)                         2433     3468
## sd(beta_Time1)                         1195     1616
## sd(beta_Time2)                         1625     2354
## sd(logalpha_Time0)                     6103     6517
## sd(logalpha_Time1)                     4922     6086
## sd(logalpha_Time2)                     5038     6050
## cor(beta_Time0,beta_Time1)             3586     4992
## cor(beta_Time0,beta_Time2)             1817     3708
## cor(beta_Time1,beta_Time2)             2395     4081
## cor(beta_Time0,logalpha_Time0)         4537     5640
## cor(beta_Time1,logalpha_Time0)         1841     3291
## cor(beta_Time2,logalpha_Time0)         3460     4079
## cor(beta_Time0,logalpha_Time1)         5202     5551
## cor(beta_Time1,logalpha_Time1)         2920     4875
## cor(beta_Time2,logalpha_Time1)         3622     4573
## cor(logalpha_Time0,logalpha_Time1)     6797     6956
## cor(beta_Time0,logalpha_Time2)         6741     6356
## cor(beta_Time1,logalpha_Time2)         3391     5140
## cor(beta_Time2,logalpha_Time2)         4763     6157
## cor(logalpha_Time0,logalpha_Time2)     6863     5976
## cor(logalpha_Time1,logalpha_Time2)     5907     6416
## 
## Population-Level Effects: 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## beta_Intercept            0.55      0.12     0.33     0.81 1.00     1491
## beta_DepButter            0.05      0.07    -0.08     0.19 1.00     3452
## beta_DepArm              -0.10      0.08    -0.27     0.06 1.00     3395
## beta_DepShore             0.44      0.07     0.30     0.58 1.00     3522
## beta_DepLetter            0.40      0.06     0.28     0.52 1.00     6632
## beta_DepQueen             0.25      0.07     0.12     0.39 1.00     8389
## beta_DepCabin             0.20      0.06     0.08     0.33 1.00     5886
## beta_DepPole              0.25      0.06     0.12     0.37 1.00     7173
## beta_DepTicket            0.14      0.06     0.02     0.25 1.00     5037
## beta_DepGrass             0.47      0.07     0.34     0.61 1.00     6971
## beta_DepEngine            0.27      0.06     0.15     0.39 1.00     4899
## beta_BOI                  0.70      0.12     0.46     0.94 1.00     2220
## beta_Concrete            -0.49      0.11    -0.71    -0.26 1.00     2832
## beta_FreqHAL:Trial230     0.04      0.48    -0.91     0.94 1.01     1130
## beta_FreqHAL:Trial231    -0.47      0.17    -0.74    -0.07 1.00     1148
## beta_AoA:Trial120         0.10      0.30    -0.53     0.66 1.00     2139
## beta_AoA:Trial121        -0.34      0.15    -0.63     0.00 1.00     2101
## logalpha_Intercept       -0.24      0.71    -1.45     1.32 1.00     7352
##                       Tail_ESS
## beta_Intercept            2004
## beta_DepButter            5075
## beta_DepArm               4799
## beta_DepShore             4761
## beta_DepLetter            5113
## beta_DepQueen             5696
## beta_DepCabin             6414
## beta_DepPole              5788
## beta_DepTicket            5464
## beta_DepGrass             5961
## beta_DepEngine            3370
## beta_BOI                  3000
## beta_Concrete             3392
## beta_FreqHAL:Trial230     2169
## beta_FreqHAL:Trial231     1875
## beta_AoA:Trial120         4035
## beta_AoA:Trial121         2297
## logalpha_Intercept        4750
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Model Comparisons

Since not all models were compared to one another, the results shown here are perhaps better contextualized by viewing it in the context of the modeling flowchart. To help with understanding the modeling choices, some discussion of how the model comparison results were interpreted and used to inform the next model fit is provided after each comparison. Again, the comparison of models was based on a combination of the LOOIC and pseudo-BMA. For those unfamiliar with these methods, some brief description of these criteria is provided here.

The LOOIC (leave-one-out cross-validation information criterion) serves a similar function as other information criterion like the AIC or BIC. The LOOIC aims to estimate the expected log predictive density of a model fit on new data, or framed another way, the expected model performance in out-of-sample predictions. The computation of the LOOIC assumes that any single observation can be omitted without significantly altering the posterior distribution, which is where the idea of a leave-one-out criterion. Comparing LOOIC values thus directly compares models for their ability to predict out-of-sample estimates. Like other information criteria, a lower LOOIC corresponds to better model performance. An important caveat to the LOOIC is that brms and related Stan programs (e.g., loo) favor reporting the actual expected log predictive distribution (ELPD) differences rather than the more traditional information criterion statistic (-2*ELPD). The multiplication of the ELPD by -2 is done for historic reasons as this simple transformation causes the information criterion for follow a chi-squared distribution, making it possible to conduct likelihood ratio tests. In a Bayesian framework, access to the posterior permits direct comparison (e.g., \(ELPD_{Model1}-ELPD_{Model2}\)) since the standard error of the posterior of this difference can also be computed. Using that standard error, a “significant” difference in the model’s performance can be found by \(ELPD_{Diff} > 1.96\times{SE}\), or more informally by checking whether the difference in ELPD is at least twice as large as the standard error.

The pseudo-BMA (Bayesian model averaging) is a form of model stacking/weighting/averaging. In this particular framework, the aim of the pseudo-BMA method is to weight models by their expected log predictive distribution. A model given greater weight would be preferred while models sharing similar weights would suggest non-superiority of any model.

ModelComparisons <- readRDS("ModelComparisons.rds")

matrix(c(ModelComparisons$LOOIC$Comparison.1[, 1:2],
         ModelComparisons$BMAs$Comparison.1[2:1]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("2PL", "Rasch"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Rasch and 2PL Intercept Models", digits = c(2, 2, 59), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Rasch and 2PL Intercept Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
2PL 0.00 0.00 1.00e+00
Rasch -191.97 24.23 9.92e-57

In the comparison of the Rasch and 2PL models, both the LOOIC and pseudo-BMA model weight agree that the 2PL model is preferable. As a quick clarification note, it may appear at first glance that the Rasch model had a smaller LOOIC since the ELPD difference listed for it is negative. It is important to remember that this difference always reflects the difference between the smaller ELPD from the larger ELPD. Since this is the arrangement, the negative value in the difference reflects that the 2PL model had a smaller ELPD than the Rasch model in the same way that the difference between 5 and 10 would be -5. The ELPD difference is about 8 times larger than the standard error of that difference, which supports that the difference is real. This is congruent with the pseudo-BMA weights wherein the 2PL is given essentially all the weight in the comparison of the two models.

matrix(c(ModelComparisons$LOOIC$Comparison.2[, 1:2],
         ModelComparisons$BMAs$Comparison.2[sapply(row.names(ModelComparisons$LOOIC$Comparison.2), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.2)$names)})]),
       nrow = 5, ncol = 3, byrow = FALSE,
       dimnames = list(c("Varying Trials, Fixed Intercept", "Varying Trials, Fixed and Random Intercept", "Trial 1, Fixed Intercept", "Trial 1, Fixed and Random Intercepts", "Fixed Trials, Fixed and Random Intercept"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the 2PL Intercept Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the 2PL Intercept Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Varying Trials, Fixed Intercept 0.00 0.00 0.98
Varying Trials, Fixed and Random Intercept -6.15 2.26 0.02
Trial 1, Fixed Intercept -354.86 26.46 0.00
Trial 1, Fixed and Random Intercepts -355.25 26.48 0.00
Fixed Trials, Fixed and Random Intercept -2050.21 61.17 0.00

Comparing the various specifications of the intercept-only 2PL models also demonstrates clear concordance of the LOOIC and pseudo-BMA model weights. A model that allows items to vary over all three trials and specifies only a fixed (rather than a fixed and random) intercept for both difficulty and discrimination is the most preferred model.

matrix(c(ModelComparisons$LOOIC$Comparison.3[, 1:2],
         ModelComparisons$BMAs$Comparison.3[sapply(row.names(ModelComparisons$LOOIC$Comparison.3), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.3)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "Varying Trials, Fixed Intercept"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the 2PL Intercept and Local Dependency Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the 2PL Intercept and Local Dependency Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.00 0.00 0.95
Varying Trials, Fixed Intercept -26.96 15.92 0.05

Comparing the preferred intercept model to one that includes the dependency of recalling items over trials, the LOOIC and pseudo-BMA weights support preferring the local dependency effect.

matrix(c(ModelComparisons$LOOIC$Comparison.4[, 1:2],
         ModelComparisons$BMAs$Comparison.4[sapply(row.names(ModelComparisons$LOOIC$Comparison.4), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.4)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Dependency on Easiness Only", "Local Dependency"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency Model on both Parameters vs Easiness Only", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency Model on both Parameters vs Easiness Only
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Dependency on Easiness Only 0.00 0.00 0.99
Local Dependency -12.92 5.88 0.01

Dropping the local dependency effect on items’ discrimination parameter resulted in a preferred model by both the LOOIC and pseudo-BMA model weights. This indicates that items’ discrimination varies across trials in the same manner regardless of whether a word was recalled earlier or not, suggesting that the item’s position is more important for determining its discrimination. In contrast, recalling a word in an earlier trial does impact its easiness at a later trial.

matrix(c(ModelComparisons$LOOIC$Comparison.5[, 1:2],
         ModelComparisons$BMAs$Comparison.5[sapply(row.names(ModelComparisons$LOOIC$Comparison.5), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.5)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Easiness Only", "Uniform Dependency"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Item-specific vs Uniform Local Dependency Models", digits = c(2, 2, 14), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Item-specific vs Uniform Local Dependency Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Easiness Only 0.00 0.00 1.00e+00
Uniform Dependency -44.97 8.69 3.03e-12

Assuming a uniform dependency effect across all words resulted in poorer model performance per both the LOOIC and pseudo-BMA weights. This result indicates that there is a unique interaction between prior learning and the words themselves. If it was just the case that learning a word on a prior trial causes a certain change in its likelihood of being recalled later, then the uniform dependency effect would have been preferred. It appears instead that some words benefit from being learned earlier while others do not. This is consistent with the fact that not all of the word-specific local dependency effects are credibly different from zero in effect size.

matrix(c(ModelComparisons$LOOIC$Comparison.6[, 1:2],
         ModelComparisons$BMAs$Comparison.6[sapply(row.names(ModelComparisons$LOOIC$Comparison.6), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.6)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "Learning/Growth"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency and Learning Models", digits = c(2, 2, 3), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency and Learning Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.00 0.00 0.997
Learning/Growth -40.31 14.71 0.003

Comparing the local dependency model (with item-specific effects on easiness only) to a model that assumes a person-specific growth curve across the trials demonstrates that the dependency approach is preferred by both the LOOIC and pseudo-BMA weights. For clarification, the growth curve/learning model treats each trial as measuring a multidimensional trait that changes over trials (e.g., rather than a unidimensional “memory” trait this models something like a “learning” trait that grows over the three trials).

matrix(c(ModelComparisons$LOOIC$Comparison.7[, 1:2],
         ModelComparisons$BMAs$Comparison.7[sapply(row.names(ModelComparisons$LOOIC$Comparison.7), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.7)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "Multidimensional"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency and Multidimensional Models", digits = c(2, 2, 3), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency and Multidimensional Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.00 0.0 0.998
Multidimensional -45.72 14.7 0.002

Similarly, comparing the local dependency model to a model that assumes that each trial corresponds to its own oblique factor/trait reveals clear preference for the local dependency model. The results of this and the previous model comparisons suggest that the CERAD list learning test is perhaps best thought of as unidimensional with test-specific relationships between items. Part of the advantage of the local dependency effects is that they are dynamic and person-specific. In fact, the local dependency effects are actually more like a differential item functioning term as they are an interaction between the items and person. In other words, the expected item easiness for “Shore” on trial 3 will be different for a person who has previously recalled “Shore” on the other two trials versus someone who has not. The local dependency effects thus encode learning (i.e., values of the local dependency variable increase as a word is learned) and dynamic performance (i.e., the expectation for responses changes after each trial).

matrix(c(ModelComparisons$LOOIC$Comparison.8[, 1:2],
         ModelComparisons$BMAs$Comparison.8[sapply(row.names(ModelComparisons$LOOIC$Comparison.8), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.8)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "Serial Position"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency and Serial Position Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency and Serial Position Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.0 0.00 0.68
Serial Position -0.8 0.55 0.32

Including a fixed effect for the order of presentation for the words did not result in a clearly superior model to the local dependency model. The LOOICs are not compellingly different between the two models, and in comparing the weights, the local dependency model receives about 2/3rds of the overall weight.

matrix(c(ModelComparisons$LOOIC$Comparison.9[, 1:2],
         ModelComparisons$BMAs$Comparison.9[sapply(row.names(ModelComparisons$LOOIC$Comparison.9), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.9)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "Serial Position by Trial"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency and Serial Position (varying by trial) Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency and Serial Position (varying by trial) Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.00 0.00 0.74
Serial Position by Trial -1.12 0.55 0.26

When the serial position effect is permitted to vary by trial, the LOOIC and pseudo-BMA weights are more clearly in favor of the standard local dependency model specification. In reality, it was not assumed that including varying effects for item position by trial would improve model fit; instead, this model was fit in order to test whether the serial position effect was more obvious for a particular trial and hence may require special modeling. In this case, however, no such support was found. Shown here are the conditional effects plots for item position in both the serial position models:

conditional_effects(TwoPL_srlps, effects = "ItemPos")

conditional_effects(TwoPL_t3spe, effects = "ItemPos:Time")

As is readily apparent from these plots, the item position demonstrates no clear quadratic curvature characteristic of the serial position effect. Similarly, the plots demonstrate that the position of the word on the list exerts no clear effect on recall probability. Allowing a unique effect for each trial also does not help as the final effect estimates are all nearly indistinguishable from one another.

matrix(c(ModelComparisons$LOOIC$Comparison.10[, 1:2],
         ModelComparisons$BMAs$Comparison.10[sapply(row.names(ModelComparisons$LOOIC$Comparison.10), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.10)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "All Item Covariates"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency and Complete Item Covariates Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency and Complete Item Covariates Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.00 0.00 0.94
All Item Covariates -4.98 2.91 0.06

Including all explanatory item covariates in the model did not produce better model fit than including only the item-specific dependency effects. This is not a surprising result as, while hypothesized to all be relevant, rarely is a model improved by include many predictors. As was demonstrated in the model summaries for the explanatory item covariates model in the previous section, many of the item-level predictors are not credibly different from zero, meaning that model performance could potentially be improved by removing some of these variables and searching for a more parsimonious model. At least one clear interpretation from the summary of the item covariate model was that, like the local dependency effects, these variables impacted only easiness, so moving forward, item effects for discrimination are not evaluated.

matrix(c(ModelComparisons$LOOIC$Comparison.11[, 1:2],
         ModelComparisons$BMAs$Comparison.11[sapply(row.names(ModelComparisons$LOOIC$Comparison.11), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.11)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Local Dependency", "All Item Covariates by Trials"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Local Dependency and Item Covariates varying by Trial Models", digits = c(3, 2, 3), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Local Dependency and Item Covariates varying by Trial Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Local Dependency 0.000 0.00 0.497
All Item Covariates by Trials -0.002 1.23 0.503

Somewhat paradoxically, including more complex interactions seemingly improved overall model performance to being equivocal to the simpler local dependency model. Part of this is likely dropping the effects on discrimination. The unique part of explanatory item covariates is that they are essentially partitioning variance of a known effect (in this case item difficulty). As a result, item covariates are not adding explanatory power but instead explaining estimated parameters. In this way, significant overall improvement in the model is unlikely from adding item covariates since it is only adding complexity in most cases. Based on this, the more complex but equally performing item covariate model was taken for the next steps instead of the local dependency model. Regardless, the point of this model fit and comparison was to examine what effects might be relevant to include in a final model. Shown below are the conditional effects plots for the item covariate effects by trial:

plot(conditional_effects(TwoPL_itmcp, effects = c("FreqSTX:Time", "Concrete:Time", "Density:Time", "Diversity:Time", "AoA:Time", "BOI:Time", "Phonemes:Time", "Ambiguous:Time", "NamingZ:Time")), ask = FALSE)

Using these plots and the posterior estimates of the coefficients, a new model specification reducing the number of item predictors and allowing some specific trial effects was developed. For the sake of transparency, predictors were considered if there was not a credible possibility that the effect was zero. The model’s posterior summaries described in the previous section can be referenced to see which predictors included zero in their 95% credible intervals. Then, the determination of whether a unique trial effect was present was determined based on the plots. Consider the plots for Age of Acquisition (AoA). These plots make it clear that the estimated effect for AoA is negative in the first and second trials but then positive for the final trial. Based on this visualization, it was determined to collapse the effect of AoA in trials 1 and 2 and then allow a different effect for trial 3.

matrix(c(ModelComparisons$LOOIC$Comparison.12[, 1:2],
         ModelComparisons$BMAs$Comparison.12[sapply(row.names(ModelComparisons$LOOIC$Comparison.12), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.12)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("All Item Covariates by Trials", "Reduced Item Covariates"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Complete Item Covariates varying by Trials and Reduced Item Covariate Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Complete Item Covariates varying by Trials and Reduced Item Covariate Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
All Item Covariates by Trials 0.0 0.00 0.7
Reduced Item Covariates -1.1 1.36 0.3

The reduced model does not improve from the complicated model. From the perspective of the LOOIC, the models are similar to one another; however, the pseudo-BMA weights indicate that the complete model makes better predictions. To help understand why this might be the case, again the conditional effects plots can be examined:

plot(conditional_effects(TwoPL_itmcr, effects = c("Concrete:Trial23", "BOI:Trial23", "Diversity:Trial23", "FreqHAL:Trial23", "AoA:Trial12")), ask = FALSE)

Some needed alterations to the model are clear from these plots. For example, the estimated effects for concreteness and body-object integration are clearly equivalent across all trials, which means that these variables are likely best summarized by a single effect across all trials. Where word frequency from the SUBTLEX database was not a significant predictor of word easiness, it appears that the frequency value from the HAL database does better. As can be confirmed in the previous section, this model finds that semantic diversity and AoA are both not credibly different than zero, so they can be reasonably dropped from the model. Of some note, which can again be confirmed in the model summary in the previous section, AoA does appear to be close to a credible effect with an upper credible interval bound of essentially zero.

matrix(c(ModelComparisons$LOOIC$Comparison.13[, 1:2],
         ModelComparisons$BMAs$Comparison.13[sapply(row.names(ModelComparisons$LOOIC$Comparison.13), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.13)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Simplified Item Covariates", "All Item Covariates by Trials"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Complete Item Covariates varying by Trials and Simplified Item Covariate Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Complete Item Covariates varying by Trials and Simplified Item Covariate Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Simplified Item Covariates 0.0 0.00 0.54
All Item Covariates by Trials -0.2 1.19 0.46

A simplified model that fit a single effect for concreteness and BOI, dropped AoA and semantic diversity, and used the HAL word frequency was estimated based on the above results. Compared to the model with all the item covariates varying by trials, this simplified model was largely identical with no preference indicated by ether the LOOIC or pseudo-BMA weights. In the interest of parsimony, the simplified model was taken for further comparisons. Recall that AoA had an effect that bordered on being credibly different than zero. Given the simplified model has fewer predictors, another model was fit returning AoA to see whether the removal of other predictors may clarify the effect of AoA.

matrix(c(ModelComparisons$LOOIC$Comparison.14[, 1:2],
         ModelComparisons$BMAs$Comparison.14[sapply(row.names(ModelComparisons$LOOIC$Comparison.14), function(x) {grep(x, attributes(ModelComparisons$BMAs$Comparison.14)$names)})]),
       nrow = 2, ncol = 3, byrow = FALSE,
       dimnames = list(c("Final Model", "Simplified Item Covariates"), c("ELPD Difference", "Standard Error of Difference", "Pseudo-BMA Weight"))) %>%
  kable(caption = "Comparison of the Final and Simplified Item Covariate Models", digits = c(2, 2, 2), align = 'ccc') %>%
  kable_classic(full_width = FALSE, position = "float_right")
Comparison of the Final and Simplified Item Covariate Models
ELPD Difference Standard Error of Difference Pseudo-BMA Weight
Final Model 0.00 0.00 0.62
Simplified Item Covariates -0.71 1.16 0.38

Adding back in AoA seems to cause no loss in parsimony per the LOOIC but does contribute to improvement in the predictions of the model. As can be confirmed in the previous section as well, the AoA effect is more clearly non-zero in the final model.

Final Model Details

This final section provides further details on the final item covariate model identified in this study. To begin with, some additional details regarding the model’s performance and fit are provided for readers. The manuscript provides summaries of coefficients as well as item and person fit statistics. In a previous section, this supplemental manuscript provided visual summaries of the coefficients as well. To further enrich the visual understanding of the final data, the following plots describe item and person fit further:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

These two plots show the empirical distribution of the differences in log-likelihood of the predicted and observed data under the models. The first plot is grouped by items and reflects the item fit. In an ideal model, the distribution is normally distributed with a mean of 0. These plots are manageable since there are only 10 items on the CERAD, but the same is not true for person fit, which has a total of 1219 individuals to show in a plot. To address the impracticality of plotting all of these individuals, the most extreme case (individual with worst fit) is shown. While the aim of this study was to examine predictors of item traits, it is still possible to visualize person traits (i.e., memory). The following plot shows the ordered distribution of latent trait estimates and their 95% credible intervals for each participant in the study:

The plots demonstrate that, on the whole, the estimates of a person’s latent trait is fairly wide. This is not entirely surprising given that the only estimate being given to the latent trait is a random intercept for each person. The explanatory item response theory model will be expanded in a future study to include factors that predict latent trait, which may reduce uncertainty in the latent trait measurement. The largest limiter to higher accuracy in the latent trait estimate is the fact that memory is being measured with just 10 items repeated only 3 times, so person predictors are likely not going to be a significant source of error reduction. Note as well that, like the item plots in the manuscript, these latent trait estimates do not include any effect of local dependency since this significantly increases the number of plots needed to characterize every possible combination of learned items over the 3 trials. It is also potentially useful to visualize all the conditional effects from the final model. These plots are shown next:

There are also some general test descriptives that are usually helpful to examine. These include test reliability, the expected score function, and test information. As with the other plots, these estimates are based on exclusion of the local dependency effects. The true values of these plots are thus only approximated by these plots. The following plots are these visualizations in their respective order:

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Within IRT, information and reliability are very closely related. It is clear that these plots both have generally the same shape with peaks in their functions around slightly below average scores. As is clear from both of these scales, the CERAD word list is not ideal for measuring individuals with either very low or very high memory abilities. In dementia/mild cognitive impairment research, this particular trait may be favorable as the goal is often to characterize with greater precision mildly impaired memory with relatively little need for carefully measuring extremes. The expected score function is a useful visual analogue for the relationship between raw score and latent trait estimate. The relationship is clearly monotonic and overall very gradual, which means that the raw scores do span a relatively wide range of latent trait values.